This notebook is written with reticulate, a package that allows inter-operation between R and Python.


1 Motivation

Why do we need to explain a machine learning model? The benefit of an explanable model against a black-box model is for the model to be trusted. Trust can be important in many real applications where the successful deployment of a machine learning model requires the trust from end users. Sometimes trust plays a even bigger role than model accuracy.

Other than trust, model explainability (or interpretability, interchangeably used hereafter) may also guide us in the correct direction to further improve the model.1

In general, linear model is more interpretable than non-linear model. But the former also suffers from lower accuracy. More advanced and hence complicated model usually has worse interpretability.

One should not confuse model explainability with the actual causality. Being able to explain a model doesn’t mean that we can identify any ground-truth causal relation behind the model. Model explainability is for and only for the model, but not for the facts we’d like to model. Nevertheless, understand how we can reason the model definitely will help us better model the actual pattern behind the scence.

In this notebook we will walk through 3 popular approaches of model prediction explanation, each of them comes with a dedicated Python package:

  1. shap
  2. lime
  3. interpret

2 Explanation Models

An explanation model \(g(x)\) is an interpretable approximation of the original model \(f(x)\). Its sole purpose is to give extra explainability the original model fails to provide, due to its own complexity.

The general idea is to use a simplified input \(x\prime\) such that \(x = h_x(x\prime)\), where \(h_x(\cdot)\) is a mapping function for any given raw input \(x\). Then the interpretable approximation can be written as:

\[ g(x\prime) \approx f(h_x(x\prime)). \]

The additive feature attribution methods specify the explanation model of the following form:

\[ g(x\prime) = \phi_0 + \sum_{i = 1}^m \phi_i x_i\prime, \]

where \(m\) is total number of simplified features, \(x\prime \in \{0, 1\}\) simply an indicator.2 Apparently, the choice of an additive model is for (linear) intrepretability. The simplified features are an interpretable representation of the original model features.

3 LIME

One very popular such above additive model is LIME (Ribeiro, Singh, and Guestrin (2016)). LIME stands for Local Interpretable Model-Agnostic Explanations. As its full name suggests, LIME can be applied to any machine learning model. LIME achieves prediction-level interpretability by approxmiating the original model with an explanation model locally around that prediction.

TODO: Add theory briefing here.

3.1 On Text Classifiers

For text classification problem, the most straightforward interpretable representation of the model features will be a binary indicator vector of bag of words. So the explanation model will try to reason which word or token is driving the prediction in what direction. And this is true no matter the form of the original model feature. May it be a word count matrix, a term frequency-inverse document frequency (TF-IDF) matrix, or numerical embeddings.

In the following we will use Large Movie Review Dataset to do a binary sentiment classification exercise. We will use machine learning libraries such as scikit-learn and tensorflow to quickly build models and use lime to experiment explanation modeling.

import os
import logging
logging.getLogger("tensorflow").setLevel(logging.ERROR)
import warnings
warnings.simplefilter(action="ignore", category=UserWarning)
warnings.simplefilter(action="ignore", category=FutureWarning)

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import tensorflow as tf
print(tf.__version__)
2.0.0
if tf.test.is_gpu_available():
  print(tf.test.gpu_device_name())
/device:GPU:0
import sklearn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.model_selection import train_test_split
import joblib

print(sklearn.__version__)
0.22
# Create model dir to cache all models trained in the notebook.
model_dir = "models"
if not os.path.exists(model_dir):
    os.makedirs(model_dir)

# Directory to cache dataset.
home = os.path.expanduser("~")
cache_dir = os.path.join(home, ".keras")

First, we prepare the movie review dataset.3

import tensorflow_datasets as tfds

# Load the data as tf.data.Dataset.
imdb = tfds.load(name="imdb_reviews", as_supervised=True,
                 data_dir=os.path.join(home, "tensorflow_datasets"))

The dataset is a perfectly balanced dataset with 50,000 examples, half for positive and half for negative sentiment.

# Extract all texts as list since we want to use libraries other than tensorflow as well.
# And since this is a small dataset, we don't care about memory usage.
# We skip the use of a dataset iterator.
imdb_reviews_train = []
imdb_reviews_test = []
imdb_y_train = []
imdb_y_test = []
for x, y in imdb["train"].batch(128):
  imdb_reviews_train.extend(x.numpy())
  imdb_y_train.extend(y.numpy())
for x, y in imdb["test"].batch(128):
  imdb_reviews_test.extend(x.numpy())
  imdb_y_test.extend(y.numpy())

# TF works on bytes, but some other packages may only work on decoded string.
imdb_reviews_train = [b.decode("utf8") for b in imdb_reviews_train]
imdb_reviews_test = [b.decode("utf8") for b in imdb_reviews_test]
imdb_y_train = np.array(imdb_y_train)
imdb_y_test = np.array(imdb_y_test)

# Take one review.
print(imdb_reviews_train[87])
Any movie that portrays the hard-working responsible husband as the person who has to change because of bored, cheating wife is an obvious result of 8 years of the Clinton era.<br /><br />It's little wonder that this movie was written by a woman.
print(imdb_y_train[87])  # Label. 0 as negative and 1 as positive.
0

We use the data prepared by tensorflow-datasets here just to save some time. For those who want to process the data in its very original format (where one review is in one .txt file), the files can be downloaded by this piece of code:

imdb_remote_path = "https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"
imdb_fname = os.path.basename(imdb_remote_path)
imdb_local_path = os.path.join(cache_dir, "datasets", imdb_fname)

if not os.path.exists(imdb_local_path):
  _ = tf.keras.utils.get_file(fname=imdb_fname, origin=imdb_remote_path,
                              extract=True, cache_dir=cache_dir)

3.1.1 Explain Random Forest

Let’s build a random forest with TF-IDF as our feature space. We will use the popular scikit-learn library for implementation.4

# We drop words that are too frequent or too rare in the training dataset.
imdb_vectorizer = TfidfVectorizer(lowercase=True, min_df=10, max_df=.9)
imdb_X_train = imdb_vectorizer.fit_transform(imdb_reviews_train)
imdb_X_test = imdb_vectorizer.transform(imdb_reviews_test)
print(len(imdb_vectorizer.vocabulary_))  # Without OOV token.
18518
imdb_rf_model_file = "models/text_rf.joblib"

# Save/reload the model to save notebook rendering time.
if os.path.exists(imdb_rf_model_file):
  imdb_rf = joblib.load(imdb_rf_model_file)
else:
  imdb_rf = RandomForestClassifier(n_estimators=300, random_state=64, n_jobs=-2)
  _ = imdb_rf.fit(imdb_X_train, imdb_y_train)
  _ = joblib.dump(imdb_rf, imdb_rf_model_file)

imdb_rf_pred = imdb_rf.predict(imdb_X_test)
imdb_rf_yhat = imdb_rf.predict_proba(imdb_X_test)[:,1]

print(classification_report(imdb_y_test, imdb_rf_pred))
              precision    recall  f1-score   support

           0       0.84      0.86      0.85     12500
           1       0.86      0.84      0.85     12500

    accuracy                           0.85     25000
   macro avg       0.85      0.85      0.85     25000
weighted avg       0.85      0.85      0.85     25000
print(roc_auc_score(imdb_y_test, imdb_rf_yhat))
0.9274221727999999

As a baseline without extensive tuning (we didn’t tune anything indeed!), random forest seems to perform fairly well on this dataset.

As part of the algorithm’s design we are able to derive a global view of feature importance. This is based on how much each feature can reduce the impurity during all tree splittings. For example, we can plot the top 20 features:

sorted_vocab = sorted(imdb_vectorizer.vocabulary_.items(), key=lambda kv: kv[1])
sorted_vocab = [w for w, i in sorted_vocab]

imdb_rf_feat_imp = pd.Series(imdb_rf.feature_importances_, index=sorted_vocab).sort_values()
ax = imdb_rf_feat_imp.tail(20).plot(kind="barh")
plt.show()

Interpretation of the impurity-based ranking must be very careful though. For example, related features will theoretically have similar impact but only one of it will gain higher score (and suppress the other) in the ranking. Which one stands out is totally random.

In general it is NOT recommended to use impurity or even loss-based feature ranking to interpret a tree ensemble model. Such ranking information is still useful to understand different aspects of the model, and can be used to subset feature to counter over-fitting issue, if any. But it won’t help really explain the model per se. And this is exactly why we need a explanation model in the first place.

Now move on to model explanation with LIME:

from lime.lime_text import LimeTextExplainer

# We need a pipeline since LimeTextExplainer.explain_instance expects raw text input.
imdb_rf_pipe = make_pipeline(imdb_vectorizer, imdb_rf)
imdb_rf_explainer = LimeTextExplainer(class_names=["Negative", "Positive"])

imdb_rf_tp_idx = np.where(np.logical_and(imdb_rf_pred == 1, imdb_y_test == 1))[0]
imdb_rf_fp_idx = np.where(np.logical_and(imdb_rf_pred == 1, imdb_y_test == 0))[0]

# We take one true positive and one false positive example to demo explanation.
imdb_rf_tp_exp = imdb_rf_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_tp_idx[0]], imdb_rf_pipe.predict_proba, num_features=6)
imdb_rf_fp_exp = imdb_rf_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_fp_idx[0]], imdb_rf_pipe.predict_proba, num_features=6)
# For ipynb, one can simply call imdb_tp_exp.show_in_notebook(text=True) to embed the html output.

imdb_rf_tp_exp.save_to_file("/tmp/explain_text_rf_tp.html")
imdb_rf_fp_exp.save_to_file("/tmp/explain_text_rf_fp.html")

A True Positive Prediction Explained

A False Positive Prediction Explained

TODO: Discuss the result.

3.1.2 Explain Neural Nets with Word Embeddings

Now let’s try a neural network model with word embeddings trained from scratch. We use tensorflow.keras API to quickly build and train a neural net. We average word embeddings as the document embeddings for each review, then feed-forward a ReLU layer before the sigmoid activation for cross-entropy optimization.

As an exercise, instead of re-using the vocabulary built by TfidfVectorizer with scikit-learn, we will re-tokenize the text data with keras.preprocessing module. The inherent consistency under the Keras framework will also simplify our latter works on network layering.

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Build vocabulary. We use similar size as in our previous TfidfVectorizer.
# Since we will use zero padding, 0 cannot be used as OOV index.
# Keras tokenizer by default reserves 0 already. OOV token, if used, will be indexed at 1.
# Note that len(tokenizer.index_word) will be all vocabulary instead of `num_words`.
vocab_size = 20001  # +1 for 0 index used for padding.
oov_token = "<unk>"
tokenizer = Tokenizer(lower=True, oov_token=oov_token, num_words=vocab_size)
tokenizer.fit_on_texts(imdb_reviews_train)

# Encode text with padding to ensure fixed-length input.
seq_train = tokenizer.texts_to_sequences(imdb_reviews_train)
seq_train_padded = pad_sequences(seq_train, padding="post")
maxlen = seq_train_padded.shape[1]
seq_test = tokenizer.texts_to_sequences(imdb_reviews_test)
seq_test_padded = pad_sequences(seq_test, padding="post", maxlen=maxlen)

assert tokenizer.index_word[1] == oov_token
assert seq_train_padded.max() == vocab_size - 1

# Wrap Keras Sequential model with scikit-learn API.
# This is because LimeTextExplainer seems buggy with a native Keras model.
nn_model_file = "models/text_clf_nn.h5"

def nn_model_fn():
  embedding_size = 64
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(
      vocab_size, embedding_size, input_length=maxlen,
      mask_zero=True, name="word_embedding"),
    tf.keras.layers.GlobalAveragePooling1D(name="doc_embedding"),
    tf.keras.layers.Dense(embedding_size / 2, activation="relu", name="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid", name="sigmoid")
  ], name="nn_classifier")
  model.compile(optimizer="adam",
                loss="binary_crossentropy",
                metrics=["accuracy"])
  return model

print(nn_model_fn().summary(line_length=90))
Model: "nn_classifier"
__________________________________________________________________________________________
Layer (type)                            Output Shape                        Param #       
==========================================================================================
word_embedding (Embedding)              (None, 2493, 64)                    1280064       
__________________________________________________________________________________________
doc_embedding (GlobalAveragePooling1D)  (None, 64)                          0             
__________________________________________________________________________________________
relu (Dense)                            (None, 32)                          2080          
__________________________________________________________________________________________
sigmoid (Dense)                         (None, 1)                           33            
==========================================================================================
Total params: 1,282,177
Trainable params: 1,282,177
Non-trainable params: 0
__________________________________________________________________________________________
None
imdb_nn = tf.keras.wrappers.scikit_learn.KerasClassifier(nn_model_fn)
if os.path.exists(nn_model_file):
  # Restore the model with wrapper.
  imdb_nn.model = tf.keras.models.load_model(nn_model_file)
  imdb_nn.classes_ = np.array([0, 1])
else:
  metrics = imdb_nn.fit(
    x=seq_train_padded, y=imdb_y_train,
    batch_size=256, epochs=10,
    validation_data=(seq_test_padded, imdb_y_test),
    validation_steps=20,
    callbacks=[
      tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=2),
      tf.keras.callbacks.ModelCheckpoint(nn_model_file, monitor="val_loss", save_best_only=True)
    ],
    verbose=2)

imdb_nn_yhat = np.squeeze(imdb_nn.predict(seq_test_padded))
imdb_nn_pred = (imdb_nn_yhat > .5).astype(int)

print(classification_report(imdb_y_test, imdb_nn_pred))
              precision    recall  f1-score   support

           0       0.88      0.89      0.89     12500
           1       0.89      0.88      0.89     12500

    accuracy                           0.89     25000
   macro avg       0.89      0.89      0.89     25000
weighted avg       0.89      0.89      0.89     25000
print(roc_auc_score(imdb_y_test, imdb_nn_pred))
0.88776
def nn_predict_fn(text):
  # This is for sklearn wrapper only.
  seq = tokenizer.texts_to_sequences(text)
  seq = pad_sequences(seq, padding="post", maxlen=maxlen)
  return imdb_nn.predict_proba(seq)

imdb_nn_explainer = LimeTextExplainer(class_names=["Negative", "Positive"])

# Explain the same examples as in RF.
imdb_nn_tp_exp = imdb_nn_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_tp_idx[0]], nn_predict_fn, num_features=6)
imdb_nn_fp_exp = imdb_nn_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_fp_idx[0]], nn_predict_fn, num_features=6)

imdb_nn_tp_exp.save_to_file("/tmp/explain_text_nn_tp.html")
imdb_nn_fp_exp.save_to_file("/tmp/explain_text_nn_fp.html")

TODO: Discuss the difference between RF and NN.

3.1.3 Explain Transfer Learning

One step further, let’s use pre-trained word embeddings for the neural nets and build another explanation model. We will use GloVe (Pennington, Socher, and Manning (2014)). We use just the smaller GloVe model since our dataset is quite small. In building the GloVe embeddings we need to take special care about out-of-vocabulary token AND padding index since we will be using the Keras API.

# Download GloVe pre-trained embeddings.
# The file is about 800MB so may take some time.
home = os.path.expanduser("~")
cache_dir = os.path.join(home, ".keras")
glove6b_remote_path = "http://nlp.stanford.edu/data/glove.6B.zip"
glove6b_local_path = os.path.join(cache_dir, "datasets", "glove.6B.50d.txt")
glove6b_fname = os.path.basename(glove6b_remote_path)
if not os.path.exists(glove6b_local_path):
  _ = tf.keras.utils.get_file(fname=glove6b_fname, origin=glove6b_remote_path,
                              extract=True, cache_dir=cache_dir)

glove_all = pd.read_csv(glove6b_local_path, sep=" ", header=None, index_col=0, quoting=3)
# Map vocabulary to pre-trained embeddings.
matched_toks = []
for i, w in tokenizer.index_word.items():
  if i < vocab_size:
    if w in glove_all.index:
      matched_toks.append(w)
    else:
      matched_toks.append(oov_token)

# Note that GloVe pre-trained embeddings does not include its own OOV token.
# We will use a global average embedding to represent OOV token.
print(len([t for t in matched_toks if t == oov_token]))  # How many OOVs?
861
glove_all.loc[oov_token] = glove_all.values.mean(axis=0)
glove = glove_all.loc[matched_toks].values

# Append dummy 0-index vector to support padding.
glove = np.vstack([np.zeros((1, glove.shape[1])), glove])
print(glove.shape)
(20001, 50)

Now let’s build the neural network. Most of the code will be the same as before, only the Embedding layer now we will use a constant matrix for initialization. We make the GloVe embeddings trainable so it will further adapt to our specific dataset.

tr_model_file = "models/text_clf_tr.h5"

def tr_model_fn():
  embedding_size = glove.shape[1]
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(
      vocab_size, embedding_size, input_length=maxlen,
      embeddings_initializer=tf.keras.initializers.Constant(glove),
      trainable=True, mask_zero=True, name="glove_embedding"),
    tf.keras.layers.GlobalAveragePooling1D(name="doc_embedding"),
    tf.keras.layers.Dense(embedding_size / 2, activation="relu", name="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid", name="sigmoid")
  ], name="tr_classifier")
  model.compile(optimizer="adam",
                loss="binary_crossentropy",
                metrics=["accuracy"])
  return model

print(tr_model_fn().summary(line_length=90))
Model: "tr_classifier"
__________________________________________________________________________________________
Layer (type)                            Output Shape                        Param #       
==========================================================================================
glove_embedding (Embedding)             (None, 2493, 50)                    1000050       
__________________________________________________________________________________________
doc_embedding (GlobalAveragePooling1D)  (None, 50)                          0             
__________________________________________________________________________________________
relu (Dense)                            (None, 25)                          1275          
__________________________________________________________________________________________
sigmoid (Dense)                         (None, 1)                           26            
==========================================================================================
Total params: 1,001,351
Trainable params: 1,001,351
Non-trainable params: 0
__________________________________________________________________________________________
None
imdb_tr = tf.keras.wrappers.scikit_learn.KerasClassifier(tr_model_fn)
if os.path.exists(tr_model_file):
  # Restore the model with wrapper.
  imdb_tr.model = tf.keras.models.load_model(tr_model_file)
  imdb_tr.classes_ = np.array([0, 1])
else:
  imdb_tr = tf.keras.wrappers.scikit_learn.KerasClassifier(tr_model_fn)
  metrics = imdb_tr.fit(
    x=seq_train_padded, y=imdb_y_train,
    batch_size=256, epochs=20,
    validation_data=(seq_test_padded, imdb_y_test),
    validation_steps=20,
    callbacks=[
      tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=2),
      tf.keras.callbacks.ModelCheckpoint(tr_model_file, monitor="val_loss", save_best_only=True)
    ],
    verbose=2)

imdb_tr_yhat = np.squeeze(imdb_tr.predict(seq_test_padded))
imdb_tr_pred = (imdb_tr_yhat > .5).astype(int)

print(classification_report(imdb_y_test, imdb_tr_pred))
              precision    recall  f1-score   support

           0       0.88      0.90      0.89     12500
           1       0.89      0.87      0.88     12500

    accuracy                           0.89     25000
   macro avg       0.89      0.89      0.89     25000
weighted avg       0.89      0.89      0.89     25000
print(roc_auc_score(imdb_y_test, imdb_tr_yhat))
0.8855200000000001
def tr_predict_fn(text):
  # This is for sklearn wrapper only.
  seq = tokenizer.texts_to_sequences(text)
  seq = pad_sequences(seq, padding="post", maxlen=maxlen)
  return imdb_tr.predict_proba(seq)

imdb_tr_explainer = LimeTextExplainer(class_names=["Negative", "Positive"])

# Explain the same examples as in RF.
imdb_tr_tp_exp = imdb_tr_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_tp_idx[0]], tr_predict_fn, num_features=6)
imdb_tr_fp_exp = imdb_tr_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_fp_idx[0]], tr_predict_fn, num_features=6)

imdb_tr_tp_exp.save_to_file("/tmp/explain_text_tr_tp.html")
imdb_tr_fp_exp.save_to_file("/tmp/explain_text_tr_fp.html")

TODO: Discussion.

3.1.4 Explain Recurrent Neural Nets

As a final exercise on text classification, let’s experiment the explanation model with a recurrent neural network (RNN). Note that, even for a single layer, this will be prohibitively slow without a GPU.

rnn_model_file = "models/text_clf_rnn.h5"

def rnn_model_fn():
  embedding_size = 64
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(
      vocab_size, embedding_size,
      input_length=maxlen, mask_zero=True, name="word_embedding"),
    tf.keras.layers.GRU(64, dropout=.2, name="GRU"),
    tf.keras.layers.Dense(1, activation="sigmoid", name="sigmoid")
  ], name="rnn_classifier")
  model.compile(optimizer="adam",
                loss="binary_crossentropy",
                metrics=["accuracy"])
  return model

print(rnn_model_fn().summary(line_length=90))
Model: "rnn_classifier"
__________________________________________________________________________________________
Layer (type)                            Output Shape                        Param #       
==========================================================================================
word_embedding (Embedding)              (None, 2493, 64)                    1280064       
__________________________________________________________________________________________
GRU (GRU)                               (None, 64)                          24960         
__________________________________________________________________________________________
sigmoid (Dense)                         (None, 1)                           65            
==========================================================================================
Total params: 1,305,089
Trainable params: 1,305,089
Non-trainable params: 0
__________________________________________________________________________________________
None
imdb_rnn = tf.keras.wrappers.scikit_learn.KerasClassifier(rnn_model_fn)
if os.path.exists(rnn_model_file):
  #  # Restore the model with wrapper.
  imdb_rnn.model = tf.keras.models.load_model(rnn_model_file)
  imdb_rnn.classes_ = np.array([0, 1])
else:
  metrics = imdb_rnn.fit(
    x=seq_train_padded, y=imdb_y_train,
    batch_size=32, epochs=10,
    validation_data=(seq_test_padded, imdb_y_test),
    validation_steps=20,
    callbacks=[
      tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=2),
      tf.keras.callbacks.ModelCheckpoint(rnn_model_file, monitor="val_loss", save_best_only=True)
    ],
    verbose=2)

imdb_rnn_yhat = np.squeeze(imdb_rnn.predict(seq_test_padded))
imdb_rnn_pred = (imdb_rnn_yhat > .5).astype(int)

print(classification_report(imdb_y_test, imdb_rnn_pred))
              precision    recall  f1-score   support

           0       0.85      0.91      0.88     12500
           1       0.90      0.84      0.87     12500

    accuracy                           0.87     25000
   macro avg       0.88      0.87      0.87     25000
weighted avg       0.88      0.87      0.87     25000
print(roc_auc_score(imdb_y_test, imdb_rnn_yhat))
0.8733200000000001

Since the dataset is rather small, we didn’t see any advantage of RNN over a simple pooling embedding model. That’s see how the explanation can differ, again, for the same two examples:

def rnn_predict_fn(text):
  # This is for sklearn wrapper only.
  seq = tokenizer.texts_to_sequences(text)
  seq = pad_sequences(seq, padding="post", maxlen=maxlen)
  return imdb_rnn.predict_proba(seq)

imdb_rnn_explainer = LimeTextExplainer(class_names=["Negative", "Positive"])

# Explain the same examples as in RF.
imdb_rnn_tp_exp = imdb_rnn_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_tp_idx[0]], rnn_predict_fn, num_features=6)
imdb_rnn_fp_exp = imdb_rnn_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_fp_idx[0]], rnn_predict_fn, num_features=6)

imdb_rnn_tp_exp.save_to_file("/tmp/explain_text_rnn_tp.html")
imdb_rnn_fp_exp.save_to_file("/tmp/explain_text_rnn_fp.html")

TODO: Summarize all text models.

3.2 On Tabular Data Classifier

Lots of data can be represented in tabular format. Here we will use UCI Heart Disease dataset for demo. Particularly, we use the Cleveland dataset which is commonly used in machine learning research.5

ucihd_remote_path = "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"
ucihd_fname = os.path.basename(ucihd_remote_path)
ucihd_local_path = os.path.join(cache_dir, "datasets", ucihd_fname)

if not os.path.exists(ucihd_local_path):
  _ = tf.keras.utils.get_file(fname=ucihd_fname, origin=ucihd_remote_path,
                              extract=False, cache_dir=cache_dir)
ucihd_attr = [
  "age",
  "sex",
  "cp",  # chest pain type 1: typical angina 2: atypical angina 3: non-anginal pain 4: asymptomatic
  "trestbps",  # resting blood pressure (in mm Hg on admission to the hospital)
  "chol",  # serum cholestoral in mg/dl
  "fbs",  # (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
  "restecg",  # resting electrocardiographic results 0: normal 1: having ST-T wave abnormality 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
  "thalach",  # maximum heart rate achieved
  "exang",  # exercise induced angina (1 = yes; 0 = no)
  "oldpeak",  # ST depression induced by exercise relative to rest
  "slope",  #  the slope of the peak exercise ST segment
  "ca",  # number of major vessels (0-3) colored by flourosopy
  "thal",  # 3 = normal; 6 = fixed defect; 7 = reversable defect
  "label"  # diagnosis of heart disease (angiographic disease status) 0: < 50% diameter narrowing 1-4: > 50% diameter narrowing
]
ucihd = pd.read_csv(ucihd_local_path, header=None, names=ucihd_attr, na_values="?")
categorical_attr = ["cp", "fbs", "restecg", "exang", "thal"]
for col in categorical_attr:
  ucihd[col] = ucihd[col].astype("category")

# Clean label.
ucihd.loc[ucihd["label"] > 1, "label"] = 1

print(ucihd.shape)
(303, 14)
print(ucihd.groupby("label").size())  # Label distribution.
label
0    164
1    139
dtype: int64
print(ucihd.head())
    age  sex   cp  trestbps   chol  fbs restecg  thalach exang  oldpeak  slope   ca thal  label
0  63.0  1.0  1.0     145.0  233.0  1.0     2.0    150.0   0.0      2.3    3.0  0.0  6.0      0
1  67.0  1.0  4.0     160.0  286.0  0.0     2.0    108.0   1.0      1.5    2.0  3.0  3.0      1
2  67.0  1.0  4.0     120.0  229.0  0.0     2.0    129.0   1.0      2.6    2.0  2.0  7.0      1
3  37.0  1.0  3.0     130.0  250.0  0.0     0.0    187.0   0.0      3.5    3.0  0.0  3.0      0
4  41.0  0.0  2.0     130.0  204.0  0.0     2.0    172.0   0.0      1.4    1.0  0.0  3.0      0

The dataset contains both numerical and categorical features (all encoded in numerics already).

3.2.1 Explain Random Forest

# RF doesn't allow missing value.
# For categorical (as string) we can leave one special category for missing,
# but for numerical we need to do some special encoding or imputation.
ucihd_2 = ucihd.copy()
ucihd_2.loc[ucihd_2["ca"].isna(), "ca"] = -1  # Encode missing numerical.

# One-hot encode all categorical features.
ucihd_2 = pd.get_dummies(ucihd_2, columns=categorical_attr, dummy_na=True)
ucihd_y = ucihd_2.pop("label")
ucihd_X_train, ucihd_X_test, ucihd_y_train, ucihd_y_test = train_test_split(
  ucihd_2, ucihd_y.values, test_size=.3, random_state=64)

ucihd_rf = RandomForestClassifier(n_estimators=100, random_state=64)
_ = ucihd_rf.fit(ucihd_X_train, ucihd_y_train)

ucihd_rf_yhat = ucihd_rf.predict_proba(ucihd_X_test)[:,1]
ucihd_rf_pred = ucihd_rf.predict(ucihd_X_test)

print(classification_report(ucihd_y_test, ucihd_rf_pred))
              precision    recall  f1-score   support

           0       0.84      0.86      0.85        50
           1       0.82      0.80      0.81        41

    accuracy                           0.84        91
   macro avg       0.83      0.83      0.83        91
weighted avg       0.83      0.84      0.83        91
print(roc_auc_score(ucihd_y_test, ucihd_rf_yhat))
0.905609756097561

As one can see RF performs very well on this dataset.

To explain a model trained with numerical features, lime by default will discretize continous variables into quantiles for ease of interpretation. Discretization is done using statistics derived from the training dataset.

from lime.lime_tabular import LimeTabularExplainer

cat_ind = [i for i, col in enumerate(ucihd_2.columns) if "_" in col]
ucihd_rf_explainer = LimeTabularExplainer(
  ucihd_X_train.values, class_names=["Negative", "Positive"],
  feature_names=ucihd_2.columns,
  categorical_features=cat_ind)

ucihd_rf_tp_idx = np.where(np.logical_and(ucihd_rf_pred == 1, ucihd_y_test == 1))[0]
ucihd_rf_fp_idx = np.where(np.logical_and(ucihd_rf_pred == 1, ucihd_y_test == 0))[0]

# We take one true positive and one false positive for examples.
ucihd_rf_tp_exp = ucihd_rf_explainer.explain_instance(
  ucihd_X_test.iloc[ucihd_rf_tp_idx[0]], ucihd_rf.predict_proba, num_features=4)
ucihd_rf_fp_exp = ucihd_rf_explainer.explain_instance(
  ucihd_X_test.iloc[ucihd_rf_fp_idx[0]], ucihd_rf.predict_proba, num_features=4)

ucihd_rf_tp_exp.save_to_file("/tmp/explain_tab_rf_tp.html")
ucihd_rf_fp_exp.save_to_file("/tmp/explain_tab_rf_fp.html")

A True Positive Prediction Explained

A False Positive Prediction Explained

Discuss the two explanations.

3.2.2 Explain Gradient Boosting Trees

Gradient boosting trees (GBT) is a powerful model family proven to work exceptionally well in many different applications. Yet due to its ensembling nature, GBT is also hard to intrepret in general.

Here we demo lightgbm’s implementation of GBT with LIME explanation.

import lightgbm as lgb

ucihd_tr = lgb.Dataset(ucihd_X_train, label=ucihd_y_train)
ucihd_te = lgb.Dataset(ucihd_X_test, label=ucihd_y_test)

ucihd_lgb_params = {
  "learning_rate": .01,
  "boosting_type": "gbdt",
  "objective": "binary",
  "metric": ["binary_logloss", "auc"],
  "num_leaves": 4,
  "max_depth": 2,
  "min_data_per_leaf": 5,
  "verbose": -1,
  "seed": 64
}

ucihd_bst = lgb.train(
  params=ucihd_lgb_params,
  num_boost_round=300, early_stopping_rounds=20,
  train_set=ucihd_tr, valid_sets=[ucihd_te],
  verbose_eval=10)
Training until validation scores don't improve for 20 rounds
[10]    valid_0's binary_logloss: 0.662208  valid_0's auc: 0.742683
[20]    valid_0's binary_logloss: 0.637688  valid_0's auc: 0.837317
[30]    valid_0's binary_logloss: 0.612505  valid_0's auc: 0.836829
[40]    valid_0's binary_logloss: 0.590748  valid_0's auc: 0.862195
[50]    valid_0's binary_logloss: 0.573202  valid_0's auc: 0.86561
[60]    valid_0's binary_logloss: 0.558692  valid_0's auc: 0.871951
[70]    valid_0's binary_logloss: 0.545684  valid_0's auc: 0.870488
[80]    valid_0's binary_logloss: 0.53395   valid_0's auc: 0.877317
[90]    valid_0's binary_logloss: 0.524152  valid_0's auc: 0.876585
[100]   valid_0's binary_logloss: 0.514324  valid_0's auc: 0.879268
[110]   valid_0's binary_logloss: 0.506591  valid_0's auc: 0.877317
[120]   valid_0's binary_logloss: 0.499578  valid_0's auc: 0.877073
Early stopping, best iteration is:
[100]   valid_0's binary_logloss: 0.514324  valid_0's auc: 0.879268
ucihd_lgb_yhat = ucihd_bst.predict(ucihd_X_test)
ucihd_lgb_pred = (ucihd_lgb_yhat > .5).astype(int)

print(classification_report(ucihd_y_test, ucihd_lgb_pred))
              precision    recall  f1-score   support

           0       0.85      0.80      0.82        50
           1       0.77      0.83      0.80        41

    accuracy                           0.81        91
   macro avg       0.81      0.81      0.81        91
weighted avg       0.82      0.81      0.81        91
print(roc_auc_score(ucihd_y_test, ucihd_lgb_yhat))
0.8792682926829268

In this particular (rather small) dataset RF indeed outperforms GBT. As a matter of fact, based on existing benchmark a simple logistic regression may have a even higher score for this problem. Nevertheless, let’s move on to our explanation model with LIME:

def ucihd_lgb_predict_fn(x):
  # We need to output 2 columns for binary prob prediction.
  p = ucihd_bst.predict(x).reshape(-1, 1)
  return np.hstack((1 - p, p))

ucihd_lgb_explainer = LimeTabularExplainer(
  ucihd_X_train.values, class_names=["Negative", "Positive"],
  feature_names=ucihd_2.columns,
  categorical_features=cat_ind)

# We take the same examples previously explained in our RF explanation model.
ucihd_lgb_tp_exp = ucihd_lgb_explainer.explain_instance(
  ucihd_X_test.iloc[ucihd_rf_tp_idx[0]], ucihd_lgb_predict_fn, num_features=4)
ucihd_lgb_fp_exp = ucihd_lgb_explainer.explain_instance(
  ucihd_X_test.iloc[ucihd_rf_fp_idx[0]], ucihd_lgb_predict_fn, num_features=4)

ucihd_lgb_tp_exp.save_to_file("/tmp/explain_tab_lgb_tp.html")
ucihd_lgb_fp_exp.save_to_file("/tmp/explain_tab_lgb_fp.html")

TODO: Discussion on the explanation.

Optimized Categorical Encoding in lightgbm

This section is a digression on lightgbm usage.

Since lime’s API requires us to prepare our dataset in one-hot encoding representation, our lightgbm code use the same data pipeline as in scikit-learn random forest. But that is actually not optimized for lightgbm. The following code chunk showcases the best practice of encoding categoricals in lightgbm: We don’t encode them at all!

# We leave both missings and categoricals as-is in the dataset.
ucihd_train, ucihd_test = train_test_split(ucihd, test_size=.3, random_state=64)
ucihd_tr = lgb.Dataset(
  ucihd_train.drop("label", axis=1), label=ucihd_train["label"],
  categorical_feature=categorical_attr,
  free_raw_data=False)
ucihd_te = lgb.Dataset(
  ucihd_test.drop("label", axis=1), label=ucihd_test["label"],
  categorical_feature=categorical_attr,
  free_raw_data=False)

ucihd_bst_2 = lgb.train(
  params=ucihd_lgb_params,
  num_boost_round=300, early_stopping_rounds=20,
  train_set=ucihd_tr, valid_sets=[ucihd_te],
  verbose_eval=-1)
Training until validation scores don't improve for 20 rounds
Early stopping, best iteration is:
[100]   valid_0's binary_logloss: 0.514324  valid_0's auc: 0.879268
ucihd_lgb_yhat = ucihd_bst_2.predict(ucihd_test.drop("label", axis=1))
ucihd_lgb_pred = (ucihd_lgb_yhat > .5).astype(int)

print(roc_auc_score(ucihd_test["label"], ucihd_lgb_yhat))
0.8792682926829268

To summarize, There are two very special properties about lightgbm algorithm. lightgbm treats missings natively as a special tree split point. This allows us to keep the original missing as is and in many cases can result in better accuracy than imputation.6

In addition, lightgbm encodes categorical variables internally in a more efficient way. So we don’t even need to do one-hot encoding on our own. Of course in this tiny dataset we won’t see any noticable difference. But for large applications the impact can be huge. Whatever, by skipping one-hot encoding pipeline our code can be much neater as well.

3.3 On Image Classifier

TODO: Use a pre-trained model?

4 Shapley Regression Values

TODO: Theory Briefing here.

5 SHAP

Lundberg and Lee (2017) propose SHAP (SHapley Additive exPlanations), yet another additive feature attribution method for model explainability. It is a more general approach where LIME is indeed only a special case of it. Just like LIME, in theory it can be applied to any machine learning model, but comes with a customized fast implementation particularly for gradient boosting trees (GBT). It supports APIs of well-known GBT libraries such as xgboost, lightgbm, and catboost.

The interpretability provided by SHAP is again local. It assigns each feature an importance value for a particular prediction. Hence it provides for any given model prediction what may be the driving force for the model to make such prediction.

shap also comes with more visualization methods for feature investigation, especially for feature interaction exploration.

TODO: Theory Briefing here.

5.1 On Text Classifiers

5.1.1 Explain Random Forest

shap.TreeExplainer is optimized for GBT but not RF. For model with high dimensionality like a bag-of-words model it will suffer from high computation cost for non-GBT model. Hence we will skip the discussion on RF and move forward to a GBT implementation.

5.1.2 Explain Gradient Boosting Trees

In the previous section we didn’t train a GBT for the text classification problem. So let’s quickly build one such model first (with the same TF-IDF vectorization as we did for the random forest model).

# lightgbm does not allow utf-8 encoded feature names.
# Since important tokens are most likely ascii-compatible for our dataset,
# we simply strip non-ascii as a workaround for this exercise.
def remove_non_ascii(s):
  return "".join([i if ord(i) < 128 else "_" for i in s])

sorted_vocab_ascii = [remove_non_ascii(v) for v in sorted_vocab]

imdb_X_tr = lgb.Dataset(imdb_X_train, label=imdb_y_train, feature_name=sorted_vocab_ascii)
imdb_X_te = lgb.Dataset(imdb_X_test, label=imdb_y_test, feature_name=sorted_vocab_ascii)

imdb_lgb_params = {
  "learning_rate": .05,
  "boosting_type": "gbdt",
  "objective": "binary",
  "metric": ["binary_logloss", "auc"],
  "num_leaves": 16,
  "max_depth": 4,
  "min_data_per_leaf": 20,
  "verbose": -1
}

imdb_lgb_model_file = "models/text_clf_lgb.txt"

# Save/reload model to save notebook rendering time.
if os.path.exists(imdb_lgb_model_file):
  # TODO:
  # Parameters are not loaded back? A bug? (Which cause the subsequent call to shap_values fail.)
  # https://github.com/microsoft/LightGBM/issues/2613
  imdb_bst = lgb.Booster(model_file=imdb_lgb_model_file, params=imdb_lgb_params)
else:
  imdb_bst = lgb.train(
    params=imdb_lgb_params,
    num_boost_round=1000, early_stopping_rounds=20,
    train_set=imdb_X_tr, valid_sets=[imdb_X_te],
    verbose_eval=100)
  _ = imdb_bst.save_model(imdb_lgb_model_file)

imdb_lgb_yhat = imdb_bst.predict(imdb_X_test)
imdb_lgb_pred = (imdb_lgb_yhat > .5).astype(int)

print(classification_report(imdb_y_test, imdb_lgb_pred))
              precision    recall  f1-score   support

           0       0.88      0.85      0.86     12500
           1       0.85      0.88      0.87     12500

    accuracy                           0.86     25000
   macro avg       0.86      0.86      0.86     25000
weighted avg       0.86      0.86      0.86     25000
print(roc_auc_score(imdb_y_test, imdb_lgb_yhat))
0.9416203136

Just like RF we will have access to the overall feature importance with a GBT model:7

ax = lgb.plot_importance(imdb_bst, max_num_features=20)
plt.show()

Now for the explanation model, since shap.TreeExplainer is customized for GBT for speed, we can feed in all testing examples to calculate all shap values at once.

import shap

# Sparse matrix is supported by shap for lightgbm models.
imdb_lgb_explainer = shap.TreeExplainer(imdb_bst)
imdb_lgb_shap_values = imdb_lgb_explainer.shap_values(imdb_X_test)

def imdb_lgb_shap_plot(test_id, matplotlib=True):
  shap_plt = shap.force_plot(
    imdb_lgb_explainer.expected_value[1],
    imdb_lgb_shap_values[1][test_id,:],
    imdb_X_test[test_id,:].toarray(),  # We still need a dense matrix here.
    feature_names=sorted_vocab,
    matplotlib=matplotlib
  )
  return shap_plt

Global Importance

We can derive global importance based on shap values. Note that this is different from the loss/impurity or split time-based feature ranking derived from RF/GBT during training. It is an aggregation from all local prediction explanations (contributions) during testing data inference.

shap.summary_plot(imdb_lgb_shap_values, imdb_X_test, feature_names=sorted_vocab,
                  plot_type="bar", max_display=20, show=False)
plt.show()

Local Explanation

imdb_lgb_shap_plot(imdb_rf_tp_idx[0])

imdb_lgb_shap_plot(imdb_rf_fp_idx[0])

TODO: Discuss the result. Any thing different from LIME?

By default shap for lightgbm shows log-odds rather than probability in the plot. To verify this:

# Take the first true positive to examine:
p = imdb_bst.predict(imdb_X_test[imdb_rf_tp_idx[0],:].toarray())
print(p)
[0.71683258]
print(np.log(p / (1 - p)))
[0.92880398]

5.1.3 Explain Neural Nets with Word Embeddings

As of 2019-12-09 DeepExplainer does not yet support TF 2.0.8 And GradientExplainer is not well documented yet for TF 2.0. So we will use the KernelExplainer which is a implementation-agnostic explainer in shap. The compromise is that it will run very slow for each prediction.

imdb_exp_ind = np.array([imdb_rf_tp_idx[0], imdb_rf_fp_idx[0]])
# KernelExplainer.
def mm(X):
  return imdb_tr.predict_proba(X)[:,1]

imdb_nn_shap_explainer = shap.KernelExplainer(mm, seq_train_padded[:100])
# This is VERY slow...
imdb_nn_kernel_shap_values = imdb_nn_shap_explainer.shap_values(seq_test_padded[imdb_exp_ind])
# TODO:
# Contribution is attributed to original sequence input.
# In order to make explanation readable,
# we need to map each position to original word id then to word.

  0%|                                                                                            | 0/2 [00:00<?, ?it/s]
 50%|##########################################                                          | 1/2 [01:23<01:23, 83.47s/it]
100%|####################################################################################| 2/2 [02:48<00:00, 83.96s/it]
100%|####################################################################################| 2/2 [02:48<00:00, 84.29s/it]
# TODO: Makje sure everything works here.

# shap does not support keras model in scikit-learn wrapper.
# Let's re-build the model and retain its Sequental class.
dl_model = model_fn()
metrics = dl_model.fit(
  x=seq_train_padded, y=imdb_y_train,
  batch_size=256, epochs=20,
  validation_data=(seq_test_padded, imdb_y_test),
  validation_steps=20,
  callbacks=[
    tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=2),
    tf.keras.callbacks.ModelCheckpoint(tr_model_file, monitor="val_loss", save_best_only=True)
  ],
  verbose=0)

# DeepExplainer.
dl_shap_explainer = shap.DeepExplainer(dl_model, seq_train_padded)  # Wont' work.

# GradientExplainer.
imdb_nn_shap_explainer = shap.GradientExplainer(dl_model, seq_train_padded[:100])

imdb_nn_shap_explainer = shap.GradientExplainer(
  (imdb_tr.layers[0].input, imdb_tr.layers[-1].output),  # Not working for TF 2.0.
  seq_train_padded[:100])
imdb_nn_shap_explainer.shap_values(seq_test_padded[:3])  # Error here.

5.2 On Tabular Data Classifier

5.2.1 Explain Random Forest

ucihd_rf_explainer = shap.TreeExplainer(ucihd_rf)
ucihd_rf_shap_values = ucihd_rf_explainer.shap_values(ucihd_X_test)

def ucihd_rf_shap_plot(test_id, matplotlib=True):
  shap_plt = shap.force_plot(
    ucihd_rf_explainer.expected_value[1],
    ucihd_rf_shap_values[1][test_id,:],
    ucihd_X_test.iloc[[test_id]],
    matplotlib=matplotlib
  )
  return shap_plt

Global Feature Importance

Split-time-based feature ranking
ucihd_rf_feat_imp = pd.Series(ucihd_rf.feature_importances_, index=ucihd_X_train.columns).sort_values()
ax = ucihd_rf_feat_imp.tail(10).plot(kind="barh")
plt.show()

Shap value feature ranking
shap.summary_plot(ucihd_rf_shap_values, ucihd_X_train,
                  plot_type="bar", max_display=10, show=False)
plt.show()

Local Explanation

ucihd_rf_shap_plot(ucihd_rf_tp_idx[0])

ucihd_rf_shap_plot(ucihd_rf_fp_idx[0])

5.2.2 Explain Gradient Boosting Trees

For GBT we feed the model that is optimized, where categoricals are encoded internally without explicit one-hot encoding.

ucihd_lgb_explainer = shap.TreeExplainer(ucihd_bst_2)
ucihd_lgb_shap_values = ucihd_lgb_explainer.shap_values(ucihd_test.drop("label", axis=1))

def ucihd_lgb_shap_plot(test_id, matplotlib=True):
  shap_plt = shap.force_plot(
    ucihd_lgb_explainer.expected_value[1],
    ucihd_lgb_shap_values[1][test_id,:],
    ucihd_test.iloc[[test_id]].drop("label", axis=1),
    matplotlib=matplotlib
  )
  return shap_plt

Global Feature Importance

Split-time-based feature ranking
ax = lgb.plot_importance(ucihd_bst_2, max_num_features=10)
plt.show()

Shap value feature ranking
shap.summary_plot(ucihd_lgb_shap_values, ucihd_test.drop("label", axis=1),
                  plot_type="bar", max_display=10, show=False)
plt.show()

Local Explanation

ucihd_lgb_shap_plot(ucihd_rf_tp_idx[0])

ucihd_lgb_shap_plot(ucihd_rf_fp_idx[0])

5.2.3 The Impact of One-Hot Encoding On Explanation

As one may now realize, by explicitly one-hot-encode the categorical features we essentially split them into different features in their interpretable representation. This can be either good or bad, depending on the actual use case. From this particular aspect libary such as lightgbm provides the flexibility to allow us choose whether to do the one-hot encoding or not. So the way we want to construct the explanation model may well affect our implementation of the original model!

5.3 On Image Classifier

TODO: Use a pre-trained model?

6 Explainable Boosting Machine

Nori et al. (2019) publish the open source package interpret for a fast implementation of Generalized Additive Models with Pairwise Interactions, or GA2M (Lou et al. (2013)). As of 2019-12-09, interpret is still in its alpha release with limited documentation. The library contains two groups of modeling frameworks:

  • glassbox: explanable machine learning models
  • blackbox: machine learning explanation models (such as LIME and SHAP)

We’ve already covered the mainstream approach in the second group, i.e., models that approximate (locally) the original model (supposed to be a blackbox) for better explainability. The more interesting part of interpret is to bring about another type of model that is readily interpretable from its very origin, and yet still competitively accurate: the Explainable Boosting Machine, or EBM.

EBM is an additive model of the form:

\[ g(E(y)) = \beta_0 + \sum f_j (x_j) + \sum f_{ij}(x_i, x_j), \]

where \(g(\cdot)\) is a link function (sigmoid for binary classification, for an example), \(f_j\) is the feature function for the \(j\)-th feature, learned by a gradient boosting machine with only that feature at a time and in a round-robin fashion for all features. \(f_{ij}\) is a pairwise interaction feature function to further boost the accuracy of the model while remain interpretability.

The model is interpretable since the contribution of any individual feature can be directly quantified by their corresponding feature function \(f_j\). Such explanation can extend up to pairwise interaction if pairwise feature functions are also estimated.

TODO: How to detect pairwise interaction? Brief the FAST algorithm.

6.1 On Text/Image Data

EBM is not efficient for text dataset. Due to the algorithm’s design it will run too long for bag-of-words model since there are too many feature functions to estimate. If we fit a EBM with the movie review dataset, even if not a large dataset, we will encounter OOM (out-of-memory) issue. As a result, we will skip the discussion of EBM on a text classifier. (The same restriction applies to image dataset.)

6.2 On Tabular Data

ExplainableBoostingClassifier has a scikit-learn fashion API and hence is straightforward to use.

from interpret.glassbox import ExplainableBoostingClassifier

ucihd_ebm = ExplainableBoostingClassifier(
  n_estimators=16, feature_names=ucihd_2.columns, n_jobs=1)
_ = ucihd_ebm.fit(ucihd_X_train, ucihd_y_train)
WARNING: Logging before flag parsing goes to stderr.
W1209 23:23:00.540163 11636 all.py:334] Passing a numpy array to schema autogen when it should be dataframe.
ucihd_ebm_yhat = ucihd_ebm.predict_proba(ucihd_X_test)[:,1]
ucihd_ebm_pred = (ucihd_ebm_yhat > .5).astype(int)

print(classification_report(ucihd_y_test, ucihd_ebm_pred))
              precision    recall  f1-score   support

           0       0.88      0.88      0.88        50
           1       0.85      0.85      0.85        41

    accuracy                           0.87        91
   macro avg       0.87      0.87      0.87        91
weighted avg       0.87      0.87      0.87        91
print(roc_auc_score(ucihd_y_test, ucihd_ebm_yhat))
0.933170731707317

The model performs very well on the heart disease dataset, outperforming both RF and GBT.

6.2.1 Global Explanation

interpret comes with a rich set of visualization tools (with plotly as its backend). Model explanation is divided into two groups: global and local.

For global explanation, we have access to both global feature importance and a per-feature feature contribution stats.

ucihd_ebm_global = ucihd_ebm.explain_global()
# All feature info:
print(ucihd_ebm_global.selector)

# Global feature importance.
           Name         Type  # Unique  % Non-zero
0           age   continuous        40       1.000
1           sex  categorical         2       0.665
2      trestbps   continuous        45       1.000
3          chol   continuous       129       1.000
4       thalach   continuous        83       1.000
5       oldpeak   continuous        39       0.679
6         slope   continuous         3       1.000
7            ca   continuous         5       0.387
8        cp_1.0  categorical         2       0.080
9        cp_2.0  categorical         2       0.170
10       cp_3.0  categorical         2       0.292
11       cp_4.0  categorical         2       0.458
12       cp_nan  categorical         1       0.000
13      fbs_0.0  categorical         2       0.830
14      fbs_1.0  categorical         2       0.170
15      fbs_nan  categorical         1       0.000
16  restecg_0.0  categorical         2       0.505
17  restecg_1.0  categorical         2       0.014
18  restecg_2.0  categorical         2       0.481
19  restecg_nan  categorical         1       0.000
20    exang_0.0  categorical         2       0.670
21    exang_1.0  categorical         2       0.330
22    exang_nan  categorical         1       0.000
23     thal_3.0  categorical         2       0.571
24     thal_6.0  categorical         2       0.071
25     thal_7.0  categorical         2       0.349
26     thal_nan  categorical         2       0.009
ucihd_ebm_global.visualize().write_html("/tmp/ucihd_ebm_feat_imp.html", include_plotlyjs=False)

# Global contribution on the fisrt feature.
ucihd_ebm_global.visualize(0).write_html("/tmp/ucihd_ebm_age_imp.html", include_plotlyjs=False)

# Global contribution on the sencond feature.
ucihd_ebm_global.visualize(1).write_html("/tmp/ucihd_ebm_sex_imp.html", include_plotlyjs=False)

Feature Importance

Feature Contribution: Age

Feature Contribution: Gender

6.2.2 Local Explanation

More importantly, we must be able to explain a specific model prediction locally. This can also be done easily with a couple of lines:

# Explain the same instances previously on RF.
ucihd_exp_ind = np.array([ucihd_rf_tp_idx[0], ucihd_rf_fp_idx[0]])

# We can feed multiple examples at the same time.
ucihd_ebm_local = ucihd_ebm.explain_local(
  ucihd_X_test.iloc[ucihd_exp_ind,:], ucihd_y_test[ucihd_exp_ind])
ucihd_ebm_local.visualize(0).write_html("/tmp/ucihd_ebm_exp_tp.html", include_plotlyjs=False)
ucihd_ebm_local.visualize(1).write_html("/tmp/ucihd_ebm_exp_fp.html", include_plotlyjs=False)

7 References

Abadi, Martı́n, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, et al. 2015. “TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems.” http://tensorflow.org/.

Lou, Yin, Rich Caruana, Johannes Gehrke, and Giles Hooker. 2013. “Accurate Intelligible Models with Pairwise Interactions.” In Proceedings of the 19th Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 623–31. ACM.

Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems 30, edited by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, 4765–74. Curran Associates, Inc. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf.

Maas, Andrew L., Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. “Learning Word Vectors for Sentiment Analysis.” In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 142–50. Portland, Oregon, USA: Association for Computational Linguistics. http://www.aclweb.org/anthology/P11-1015.

Nori, Harsha, Samuel Jenkins, Paul Koch, and Rich Caruana. 2019. “InterpretML: A Unified Framework for Machine Learning Interpretability.” arXiv Preprint arXiv:1909.09223.

Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, et al. 2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12: 2825–30.

Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. “Glove: Global Vectors for Word Representation.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (Emnlp), 1532–43.

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “Why Should I Trust You?: Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 1135–44. ACM.

Ushey, Kevin, JJ Allaire, and Yuan Tang. 2019. Reticulate: Interface to ’Python’. https://CRAN.R-project.org/package=reticulate.


  1. Some people will further differentiate explainability from interpretability, by characterizing interpretability as knowing how without knowing why, and explainability as not only knowing how but also knowing why. In this notebook for simplicity we don’t take such approach.↩︎

  2. In many such methods, the simplified input is the indicator of feature presence. One example: Shapley regression values.↩︎

  3. Keras also comes with the dataset preprocessed as integer sequences (from tf.keras.datasets import imdb).↩︎

  4. It will be much faster if we choose xgboost’s or lightgbm’s implementation of random forest. However, to demonstrate compatibility of lime with scikit-learn we purposely choose the slower implementation here.↩︎

  5. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation:Robert Detrano, M.D., Ph.D.↩︎

  6. xgboost is the first to introduce such missing treatment among all the GBT package. lightgbm follows.↩︎

  7. By default lightgbm calculate the importance by counting how many times a feature contribute to an optimal split during model training. It also supports the impurity-based approach with argument importance_type set to "gain".↩︎

  8. https://github.com/slundberg/shap/issues/850.↩︎

---
title: "On Model Explainability"
subtitle: "From LIME, SHAP, to Explainable Boosting"
author:
- name: Kyle Chung
  affiliation:
date: "`r format(Sys.time(), '%d %b %Y')` Last Updated (09 Dec 2019 First Uploaded)"
output:
  html_notebook:
    highlight: tango
    number_sections: yes
    theme: paper
    toc: yes
    toc_depth: 3
    toc_float: yes
    includes:
      in_header: /tmp/meta_header.html
  code_download: true
bibliography: model_explain.bib
nocite: |
  @reticulate
  @maas-EtAl:2011:ACL-HLT2011
  @scikit-learn
  @tensorflow2015-whitepaper
abstract: |
  Model explainability has gained more and more attention recently among machine learning practitioners. Especially with the popularization of deep learning frameworks, which further promotes the use of increasingly complicated models to improve accuracy. In the reality, however, model with the highest accuracy may not be the one that can be deployed. Trust is one important factor affecting the adoption of complicated models. In this notebook we give a brief introduction to several popular methods on model explainability. And we focus more on the hands-on which demonstrates how we can actually explain a model, under a variety of use cases.
---
<!--For equation reference in Rmd.-->
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
  TeX: { equationNumbers: { autoNumber: "AMS" } }
});
</script>

<!-- Embed lime javascript library for explanation visualization. -->
<script src="lime.js"></script>

<!-- Embed plotly javascript library.
  This is the backend for interpretML visualization.
-->
<script src="../../../site_libs/utils/plotly-1.51.1.min.js"></script>

```{r meta, include=FALSE}
meta_header_file <- file("/tmp/meta_header.html")

# Add open graph meta.
meta <- c(
  '<meta name="author" content="Kyle Chung">',
  '<meta property="og:title" content="On Model Explainability: From Shap, Lime, to Interpretable Boosting">',
  '<meta property="og:type" content="article">',
  '<meta property="og:url" content="https://everdark.github.io/k9/notebooks/ml/model_explain/model_explain.nb.html">',
  '<meta property="og:image" content="https://everdark.github.io/k9/assets/androidify.jpg">',
  '<meta property="og:description" content="A data science notebook about machine learning model explainability.">'
)
contents <- meta

# Add Github corner.
github_corner_svg <- "../../../assets/github_corner.html"
github_corner_conf <- list(github_link="https://github.com/everdark/k9/tree/master/notebooks/ml/model_explain")
contents <- c(contents, stringr::str_interp(readLines(github_corner_svg), github_corner_conf))
writeLines(contents, meta_header_file)

close(meta_header_file)
```

```{r setup, include=FALSE}
library(reticulate)
r <- try(use_python(Sys.getenv("PYTHON_PATH"), required=TRUE), silent=TRUE)
if ( is(r, "try-error") ) {
  r <- try(use_virtualenv(Sys.getenv("PYTHON_PATH"), required=TRUE), silent=TRUE)
  if ( is(r, "try-error") ) use_condaenv(Sys.getenv("PYTHON_PATH"), required=TRUE)
}

# Utility to post-process html output.
library(xml2)

write_lime_js <- function(infile) {
  # lime html output contains a huge js string,
  # to reduce notebook file size we only want to declare the js once.
  outfile <- "lime.js"
  doc <- as_list(read_html(infile))
  js_str <- doc$html$head$script[[1]]
  # Use h4 for text example header to avoid being included in rmd toc.
  js_str <- gsub("h3", "h4", js_str)
  writeLines(js_str, outfile, useBytes=TRUE)
}

parse_lime_html_output <- function(infile, exclude_js=TRUE) {
  outfile <- tempfile()
  doc <- read_html(infile)
  if ( exclude_js ) xml_remove(xml_child(doc))
  write_html(doc, outfile)
  outfile
}
```

---

This notebook is written with [`reticulate`](https://github.com/rstudio/reticulate),
a package that allows inter-operation between R and Python.

---

# Motivation

Why do we need to explain a machine learning model?
The benefit of an explanable model against a black-box model is for the model to be *trusted*.
Trust can be important in many real applications where the successful deployment of a machine learning model requires the trust from end users.
Sometimes trust plays a even bigger role than model accuracy.

Other than trust,
model explainability (or interpretability, interchangeably used hereafter) may also guide us in the correct direction to further improve the model.
^[Some people will further differentiate explainability from interpretability,
by characterizing interpretability as knowing how without knowing why,
and explainability as not only knowing how but also knowing why.
In this notebook for simplicity we don't take such approach.]

In general,
linear model is more interpretable than non-linear model.
But the former also suffers from lower accuracy.
More advanced and hence complicated model usually has worse interpretability.

One should not confuse model explainability with the actual causality.
Being able to explain a model doesn't mean that we can identify any ground-truth causal relation behind the model.
Model explainability is for and only for the model,
but not for the facts we'd like to model.
Nevertheless,
understand how we can reason the model definitely will help us better model the actual pattern behind the scence.

In this notebook we will walk through 3 popular approaches of model prediction explanation,
each of them comes with a dedicated Python package:

1. [`shap`](https://github.com/slundberg/shap)
2. [`lime`](https://github.com/marcotcr/lime)
3. [`interpret`](https://github.com/interpretml/interpret)

# Explanation Models

An explanation model $g(x)$ is an *interpretable approximation* of the original model $f(x)$.
Its sole purpose is to give extra explainability the original model fails to provide,
due to its own complexity.

The general idea is to use a simplified input $x\prime$ such that $x = h_x(x\prime)$,
where $h_x(\cdot)$ is a mapping function for any given raw input $x$.
Then the interpretable approximation can be written as:

$$
g(x\prime) \approx f(h_x(x\prime)).
$$

The *additive feature attribution methods* specify the explanation model of the following form:

$$
g(x\prime) = \phi_0 + \sum_{i = 1}^m \phi_i x_i\prime,
$$

where $m$ is total number of simplified features,
$x\prime \in \{0, 1\}$ simply an indicator.^[In many such methods, the simplified input is the indicator of feature presence. One example: Shapley regression values.]
Apparently,
the choice of an additive model is for (linear) intrepretability.
The simplified features are an *interpretable representation* of the original model features.

# LIME

One very popular such above additive model is LIME (@ribeiro2016should).
LIME stands for **Local Interpretable Model-Agnostic Explanations.**
As its full name suggests,
LIME can be applied to *any* machine learning model.
LIME achieves prediction-level interpretability by approxmiating the original model with an explanation model locally around that prediction.

**TODO: Add theory briefing here.**

## On Text Classifiers

For text classification problem,
the most straightforward interpretable representation of the model features will be a binary indicator vector of bag of words.
So the explanation model will try to reason which word or token is driving the prediction in what direction.
And this is true no matter the form of the original model feature.
May it be a word count matrix,
a term frequency-inverse document frequency (TF-IDF) matrix,
or numerical embeddings.

In the following we will use [Large Movie Review Dataset](https://ai.stanford.edu/~amaas/data/sentiment/) to do a binary sentiment classification exercise.
We will use machine learning libraries such as `scikit-learn` and `tensorflow` to quickly build models and use `lime` to experiment explanation modeling.

```{python import_some}
import os
import logging
logging.getLogger("tensorflow").setLevel(logging.ERROR)
import warnings
warnings.simplefilter(action="ignore", category=UserWarning)
warnings.simplefilter(action="ignore", category=FutureWarning)

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import tensorflow as tf
print(tf.__version__)
if tf.test.is_gpu_available():
  print(tf.test.gpu_device_name())

import sklearn
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.model_selection import train_test_split
import joblib

print(sklearn.__version__)
```

```{python mkdir}
# Create model dir to cache all models trained in the notebook.
model_dir = "models"
if not os.path.exists(model_dir):
    os.makedirs(model_dir)

# Directory to cache dataset.
home = os.path.expanduser("~")
cache_dir = os.path.join(home, ".keras")
```

First,
we prepare the movie review dataset.^[Keras also comes with the dataset preprocessed as integer sequences (`from tf.keras.datasets import imdb`).]

```{python maybe_download_imdb, results="hide"}
import tensorflow_datasets as tfds

# Load the data as tf.data.Dataset.
imdb = tfds.load(name="imdb_reviews", as_supervised=True,
                 data_dir=os.path.join(home, "tensorflow_datasets"))
```

The dataset is a perfectly balanced dataset with 50,000 examples,
half for positive and half for negative sentiment.

```{python prepare_imdb}
# Extract all texts as list since we want to use libraries other than tensorflow as well.
# And since this is a small dataset, we don't care about memory usage.
# We skip the use of a dataset iterator.
imdb_reviews_train = []
imdb_reviews_test = []
imdb_y_train = []
imdb_y_test = []
for x, y in imdb["train"].batch(128):
  imdb_reviews_train.extend(x.numpy())
  imdb_y_train.extend(y.numpy())
for x, y in imdb["test"].batch(128):
  imdb_reviews_test.extend(x.numpy())
  imdb_y_test.extend(y.numpy())

# TF works on bytes, but some other packages may only work on decoded string.
imdb_reviews_train = [b.decode("utf8") for b in imdb_reviews_train]
imdb_reviews_test = [b.decode("utf8") for b in imdb_reviews_test]
imdb_y_train = np.array(imdb_y_train)
imdb_y_test = np.array(imdb_y_test)

# Take one review.
print(imdb_reviews_train[87])

print(imdb_y_train[87])  # Label. 0 as negative and 1 as positive.
```

We use the data prepared by `tensorflow-datasets` here just to save some time.
For those who want to process the data in its very original format (where one review is in one `.txt` file),
the files can be downloaded by this piece of code:

```python
imdb_remote_path = "https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"
imdb_fname = os.path.basename(imdb_remote_path)
imdb_local_path = os.path.join(cache_dir, "datasets", imdb_fname)

if not os.path.exists(imdb_local_path):
  _ = tf.keras.utils.get_file(fname=imdb_fname, origin=imdb_remote_path,
                              extract=True, cache_dir=cache_dir)
```

### Explain Random Forest

Let's build a random forest with TF-IDF as our feature space.
We will use the popular `scikit-learn` library for implementation.^[It will be much faster if we choose `xgboost`'s or `lightgbm`'s implementation of random forest. However, to demonstrate compatibility of `lime` with `scikit-learn` we purposely choose the slower implementation here.]

```{python tfidf}
# We drop words that are too frequent or too rare in the training dataset.
imdb_vectorizer = TfidfVectorizer(lowercase=True, min_df=10, max_df=.9)
imdb_X_train = imdb_vectorizer.fit_transform(imdb_reviews_train)
imdb_X_test = imdb_vectorizer.transform(imdb_reviews_test)
print(len(imdb_vectorizer.vocabulary_))  # Without OOV token.
```

```{python imdb_rf}
imdb_rf_model_file = "models/text_rf.joblib"

# Save/reload the model to save notebook rendering time.
if os.path.exists(imdb_rf_model_file):
  imdb_rf = joblib.load(imdb_rf_model_file)
else:
  imdb_rf = RandomForestClassifier(n_estimators=300, random_state=64, n_jobs=-2)
  _ = imdb_rf.fit(imdb_X_train, imdb_y_train)
  _ = joblib.dump(imdb_rf, imdb_rf_model_file)

imdb_rf_pred = imdb_rf.predict(imdb_X_test)
imdb_rf_yhat = imdb_rf.predict_proba(imdb_X_test)[:,1]

print(classification_report(imdb_y_test, imdb_rf_pred))
print(roc_auc_score(imdb_y_test, imdb_rf_yhat))
```

As a baseline without extensive tuning (we didn't tune anything indeed!),
random forest seems to perform fairly well on this dataset.

As part of the algorithm's design we are able to derive a global view of feature importance.
This is based on how much each feature can reduce the impurity during all tree splittings.
For example,
we can plot the top 20 features:

```{python imdb_rf_feat_imp}
sorted_vocab = sorted(imdb_vectorizer.vocabulary_.items(), key=lambda kv: kv[1])
sorted_vocab = [w for w, i in sorted_vocab]

imdb_rf_feat_imp = pd.Series(imdb_rf.feature_importances_, index=sorted_vocab).sort_values()
ax = imdb_rf_feat_imp.tail(20).plot(kind="barh")
plt.show()
```

Interpretation of the impurity-based ranking must be very careful though.
For example, related features will theoretically have similar impact but only one of it will gain higher score (and suppress the other) in the ranking.
Which one stands out is totally random.

In general it is NOT recommended to use impurity or even loss-based feature ranking to *interpret* a tree ensemble model.
Such ranking information is still useful to understand different aspects of the model,
and can be used to subset feature to counter over-fitting issue, if any.
But it won't help really explain the model per se.
And this is exactly why we need a explanation model in the first place.

Now move on to model explanation with LIME:

```{python lime_imdb_rf}
from lime.lime_text import LimeTextExplainer

# We need a pipeline since LimeTextExplainer.explain_instance expects raw text input.
imdb_rf_pipe = make_pipeline(imdb_vectorizer, imdb_rf)
imdb_rf_explainer = LimeTextExplainer(class_names=["Negative", "Positive"])

imdb_rf_tp_idx = np.where(np.logical_and(imdb_rf_pred == 1, imdb_y_test == 1))[0]
imdb_rf_fp_idx = np.where(np.logical_and(imdb_rf_pred == 1, imdb_y_test == 0))[0]

# We take one true positive and one false positive example to demo explanation.
imdb_rf_tp_exp = imdb_rf_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_tp_idx[0]], imdb_rf_pipe.predict_proba, num_features=6)
imdb_rf_fp_exp = imdb_rf_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_fp_idx[0]], imdb_rf_pipe.predict_proba, num_features=6)
# For ipynb, one can simply call imdb_tp_exp.show_in_notebook(text=True) to embed the html output.

imdb_rf_tp_exp.save_to_file("/tmp/explain_text_rf_tp.html")
imdb_rf_fp_exp.save_to_file("/tmp/explain_text_rf_fp.html")
```

#### A True Positive Prediction Explained {-}

```{r, echo=FALSE}
if ( !file.exists("lime.js") ) {
  write_lime_js("/tmp/explain_text_rf_tp.html")
}
htmltools::includeHTML(parse_lime_html_output("/tmp/explain_text_rf_tp.html"))
```

#### A False Positive Prediction Explained {-}

```{r, echo=FALSE}
htmltools::includeHTML(parse_lime_html_output("/tmp/explain_text_rf_fp.html"))
```

**TODO: Discuss the result.**

### Explain Neural Nets with Word Embeddings

Now let's try a neural network model with word embeddings trained from scratch.
We use `tensorflow.keras` API to quickly build and train a neural net.
We average word embeddings as the document embeddings for each review,
then feed-forward a ReLU layer before the sigmoid activation for cross-entropy optimization.

As an exercise,
instead of re-using the vocabulary built by `TfidfVectorizer` with `scikit-learn`,
we will re-tokenize the text data with `keras.preprocessing` module.
The inherent consistency under the Keras framework will also simplify our latter works on network layering.

```{python imdb_nn}
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Build vocabulary. We use similar size as in our previous TfidfVectorizer.
# Since we will use zero padding, 0 cannot be used as OOV index.
# Keras tokenizer by default reserves 0 already. OOV token, if used, will be indexed at 1.
# Note that len(tokenizer.index_word) will be all vocabulary instead of `num_words`.
vocab_size = 20001  # +1 for 0 index used for padding.
oov_token = "<unk>"
tokenizer = Tokenizer(lower=True, oov_token=oov_token, num_words=vocab_size)
tokenizer.fit_on_texts(imdb_reviews_train)

# Encode text with padding to ensure fixed-length input.
seq_train = tokenizer.texts_to_sequences(imdb_reviews_train)
seq_train_padded = pad_sequences(seq_train, padding="post")
maxlen = seq_train_padded.shape[1]
seq_test = tokenizer.texts_to_sequences(imdb_reviews_test)
seq_test_padded = pad_sequences(seq_test, padding="post", maxlen=maxlen)

assert tokenizer.index_word[1] == oov_token
assert seq_train_padded.max() == vocab_size - 1

# Wrap Keras Sequential model with scikit-learn API.
# This is because LimeTextExplainer seems buggy with a native Keras model.
nn_model_file = "models/text_clf_nn.h5"

def nn_model_fn():
  embedding_size = 64
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(
      vocab_size, embedding_size, input_length=maxlen,
      mask_zero=True, name="word_embedding"),
    tf.keras.layers.GlobalAveragePooling1D(name="doc_embedding"),
    tf.keras.layers.Dense(embedding_size / 2, activation="relu", name="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid", name="sigmoid")
  ], name="nn_classifier")
  model.compile(optimizer="adam",
                loss="binary_crossentropy",
                metrics=["accuracy"])
  return model

print(nn_model_fn().summary(line_length=90))

imdb_nn = tf.keras.wrappers.scikit_learn.KerasClassifier(nn_model_fn)
if os.path.exists(nn_model_file):
  # Restore the model with wrapper.
  imdb_nn.model = tf.keras.models.load_model(nn_model_file)
  imdb_nn.classes_ = np.array([0, 1])
else:
  metrics = imdb_nn.fit(
    x=seq_train_padded, y=imdb_y_train,
    batch_size=256, epochs=10,
    validation_data=(seq_test_padded, imdb_y_test),
    validation_steps=20,
    callbacks=[
      tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=2),
      tf.keras.callbacks.ModelCheckpoint(nn_model_file, monitor="val_loss", save_best_only=True)
    ],
    verbose=2)

imdb_nn_yhat = np.squeeze(imdb_nn.predict(seq_test_padded))
imdb_nn_pred = (imdb_nn_yhat > .5).astype(int)

print(classification_report(imdb_y_test, imdb_nn_pred))
print(roc_auc_score(imdb_y_test, imdb_nn_pred))
```

```{python lime_imdb_nn}
def nn_predict_fn(text):
  # This is for sklearn wrapper only.
  seq = tokenizer.texts_to_sequences(text)
  seq = pad_sequences(seq, padding="post", maxlen=maxlen)
  return imdb_nn.predict_proba(seq)

imdb_nn_explainer = LimeTextExplainer(class_names=["Negative", "Positive"])

# Explain the same examples as in RF.
imdb_nn_tp_exp = imdb_nn_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_tp_idx[0]], nn_predict_fn, num_features=6)
imdb_nn_fp_exp = imdb_nn_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_fp_idx[0]], nn_predict_fn, num_features=6)

imdb_nn_tp_exp.save_to_file("/tmp/explain_text_nn_tp.html")
imdb_nn_fp_exp.save_to_file("/tmp/explain_text_nn_fp.html")
```

```{r, echo=FALSE}
htmltools::includeHTML(parse_lime_html_output("/tmp/explain_text_nn_tp.html"))
```

```{r, echo=FALSE}
htmltools::includeHTML(parse_lime_html_output("/tmp/explain_text_nn_fp.html"))
```

**TODO: Discuss the difference between RF and NN.**

### Explain Transfer Learning

One step further,
let's use pre-trained word embeddings for the neural nets and build another explanation model.
We will use [GloVe](https://nlp.stanford.edu/projects/glove/) (@pennington2014glove).
We use just the smaller GloVe model since our dataset is quite small.
In building the GloVe embeddings we need to take special care about out-of-vocabulary token AND padding index since we will be using the Keras API.

```{python maybe_download_glove, results="hide"}
# Download GloVe pre-trained embeddings.
# The file is about 800MB so may take some time.
home = os.path.expanduser("~")
cache_dir = os.path.join(home, ".keras")
glove6b_remote_path = "http://nlp.stanford.edu/data/glove.6B.zip"
glove6b_local_path = os.path.join(cache_dir, "datasets", "glove.6B.50d.txt")
glove6b_fname = os.path.basename(glove6b_remote_path)
if not os.path.exists(glove6b_local_path):
  _ = tf.keras.utils.get_file(fname=glove6b_fname, origin=glove6b_remote_path,
                              extract=True, cache_dir=cache_dir)

glove_all = pd.read_csv(glove6b_local_path, sep=" ", header=None, index_col=0, quoting=3)
```

```{python imdb_transfer_learning_vocab}
# Map vocabulary to pre-trained embeddings.
matched_toks = []
for i, w in tokenizer.index_word.items():
  if i < vocab_size:
    if w in glove_all.index:
      matched_toks.append(w)
    else:
      matched_toks.append(oov_token)

# Note that GloVe pre-trained embeddings does not include its own OOV token.
# We will use a global average embedding to represent OOV token.
print(len([t for t in matched_toks if t == oov_token]))  # How many OOVs?

glove_all.loc[oov_token] = glove_all.values.mean(axis=0)
glove = glove_all.loc[matched_toks].values

# Append dummy 0-index vector to support padding.
glove = np.vstack([np.zeros((1, glove.shape[1])), glove])
print(glove.shape)
```

Now let's build the neural network.
Most of the code will be the same as before,
only the `Embedding` layer now we will use a constant matrix for initialization.
We make the GloVe embeddings *trainable* so it will further adapt to our specific dataset.

```{python imdb_transfer_learning}
tr_model_file = "models/text_clf_tr.h5"

def tr_model_fn():
  embedding_size = glove.shape[1]
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(
      vocab_size, embedding_size, input_length=maxlen,
      embeddings_initializer=tf.keras.initializers.Constant(glove),
      trainable=True, mask_zero=True, name="glove_embedding"),
    tf.keras.layers.GlobalAveragePooling1D(name="doc_embedding"),
    tf.keras.layers.Dense(embedding_size / 2, activation="relu", name="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid", name="sigmoid")
  ], name="tr_classifier")
  model.compile(optimizer="adam",
                loss="binary_crossentropy",
                metrics=["accuracy"])
  return model

print(tr_model_fn().summary(line_length=90))

imdb_tr = tf.keras.wrappers.scikit_learn.KerasClassifier(tr_model_fn)
if os.path.exists(tr_model_file):
  # Restore the model with wrapper.
  imdb_tr.model = tf.keras.models.load_model(tr_model_file)
  imdb_tr.classes_ = np.array([0, 1])
else:
  imdb_tr = tf.keras.wrappers.scikit_learn.KerasClassifier(tr_model_fn)
  metrics = imdb_tr.fit(
    x=seq_train_padded, y=imdb_y_train,
    batch_size=256, epochs=20,
    validation_data=(seq_test_padded, imdb_y_test),
    validation_steps=20,
    callbacks=[
      tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=2),
      tf.keras.callbacks.ModelCheckpoint(tr_model_file, monitor="val_loss", save_best_only=True)
    ],
    verbose=2)

imdb_tr_yhat = np.squeeze(imdb_tr.predict(seq_test_padded))
imdb_tr_pred = (imdb_tr_yhat > .5).astype(int)

print(classification_report(imdb_y_test, imdb_tr_pred))
print(roc_auc_score(imdb_y_test, imdb_tr_yhat))
```

```{python lime_imdb_transfer_learning}
def tr_predict_fn(text):
  # This is for sklearn wrapper only.
  seq = tokenizer.texts_to_sequences(text)
  seq = pad_sequences(seq, padding="post", maxlen=maxlen)
  return imdb_tr.predict_proba(seq)

imdb_tr_explainer = LimeTextExplainer(class_names=["Negative", "Positive"])

# Explain the same examples as in RF.
imdb_tr_tp_exp = imdb_tr_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_tp_idx[0]], tr_predict_fn, num_features=6)
imdb_tr_fp_exp = imdb_tr_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_fp_idx[0]], tr_predict_fn, num_features=6)

imdb_tr_tp_exp.save_to_file("/tmp/explain_text_tr_tp.html")
imdb_tr_fp_exp.save_to_file("/tmp/explain_text_tr_fp.html")
```

```{r, echo=FALSE}
htmltools::includeHTML(parse_lime_html_output("/tmp/explain_text_tr_tp.html"))
```

```{r, echo=FALSE}
htmltools::includeHTML(parse_lime_html_output("/tmp/explain_text_tr_fp.html"))
```

**TODO: Discussion.**

### Explain Recurrent Neural Nets

As a final exercise on text classification,
let's experiment the explanation model with a recurrent neural network (RNN).
Note that, even for a single layer, this will be prohibitively slow without a GPU.

```{python imdb_rnn}
rnn_model_file = "models/text_clf_rnn.h5"

def rnn_model_fn():
  embedding_size = 64
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(
      vocab_size, embedding_size,
      input_length=maxlen, mask_zero=True, name="word_embedding"),
    tf.keras.layers.GRU(64, dropout=.2, name="GRU"),
    tf.keras.layers.Dense(1, activation="sigmoid", name="sigmoid")
  ], name="rnn_classifier")
  model.compile(optimizer="adam",
                loss="binary_crossentropy",
                metrics=["accuracy"])
  return model

print(rnn_model_fn().summary(line_length=90))

imdb_rnn = tf.keras.wrappers.scikit_learn.KerasClassifier(rnn_model_fn)
if os.path.exists(rnn_model_file):
  #  # Restore the model with wrapper.
  imdb_rnn.model = tf.keras.models.load_model(rnn_model_file)
  imdb_rnn.classes_ = np.array([0, 1])
else:
  metrics = imdb_rnn.fit(
    x=seq_train_padded, y=imdb_y_train,
    batch_size=32, epochs=10,
    validation_data=(seq_test_padded, imdb_y_test),
    validation_steps=20,
    callbacks=[
      tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=2),
      tf.keras.callbacks.ModelCheckpoint(rnn_model_file, monitor="val_loss", save_best_only=True)
    ],
    verbose=2)

imdb_rnn_yhat = np.squeeze(imdb_rnn.predict(seq_test_padded))
imdb_rnn_pred = (imdb_rnn_yhat > .5).astype(int)

print(classification_report(imdb_y_test, imdb_rnn_pred))
print(roc_auc_score(imdb_y_test, imdb_rnn_yhat))
```

Since the dataset is rather small,
we didn't see any advantage of RNN over a simple pooling embedding model.
That's see how the explanation can differ, again, for the same two examples:

```{python lime_imdb_rnn}
def rnn_predict_fn(text):
  # This is for sklearn wrapper only.
  seq = tokenizer.texts_to_sequences(text)
  seq = pad_sequences(seq, padding="post", maxlen=maxlen)
  return imdb_rnn.predict_proba(seq)

imdb_rnn_explainer = LimeTextExplainer(class_names=["Negative", "Positive"])

# Explain the same examples as in RF.
imdb_rnn_tp_exp = imdb_rnn_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_tp_idx[0]], rnn_predict_fn, num_features=6)
imdb_rnn_fp_exp = imdb_rnn_explainer.explain_instance(
  imdb_reviews_test[imdb_rf_fp_idx[0]], rnn_predict_fn, num_features=6)

imdb_rnn_tp_exp.save_to_file("/tmp/explain_text_rnn_tp.html")
imdb_rnn_fp_exp.save_to_file("/tmp/explain_text_rnn_fp.html")
```

```{r, echo=FALSE}
htmltools::includeHTML(parse_lime_html_output("/tmp/explain_text_rnn_tp.html"))
```

```{r, echo=FALSE}
htmltools::includeHTML(parse_lime_html_output("/tmp/explain_text_rnn_fp.html"))
```

**TODO: Summarize all text models.**

## On Tabular Data Classifier

Lots of data can be represented in tabular format.
Here we will use [UCI Heart Disease dataset](https://archive.ics.uci.edu/ml/datasets/Heart+Disease) for demo.
Particularly,
we use the Cleveland dataset which is commonly used in machine learning research.^[V.A. Medical Center, Long Beach and Cleveland Clinic Foundation:Robert Detrano, M.D., Ph.D.]

```{python maybe_download_ucihd, results="hide"}
ucihd_remote_path = "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"
ucihd_fname = os.path.basename(ucihd_remote_path)
ucihd_local_path = os.path.join(cache_dir, "datasets", ucihd_fname)

if not os.path.exists(ucihd_local_path):
  _ = tf.keras.utils.get_file(fname=ucihd_fname, origin=ucihd_remote_path,
                              extract=False, cache_dir=cache_dir)
```

```{python preprocess_ucihd}
ucihd_attr = [
  "age",
  "sex",
  "cp",  # chest pain type 1: typical angina 2: atypical angina 3: non-anginal pain 4: asymptomatic
  "trestbps",  # resting blood pressure (in mm Hg on admission to the hospital)
  "chol",  # serum cholestoral in mg/dl
  "fbs",  # (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
  "restecg",  # resting electrocardiographic results 0: normal 1: having ST-T wave abnormality 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
  "thalach",  # maximum heart rate achieved
  "exang",  # exercise induced angina (1 = yes; 0 = no)
  "oldpeak",  # ST depression induced by exercise relative to rest
  "slope",  #  the slope of the peak exercise ST segment
  "ca",  # number of major vessels (0-3) colored by flourosopy
  "thal",  # 3 = normal; 6 = fixed defect; 7 = reversable defect
  "label"  # diagnosis of heart disease (angiographic disease status) 0: < 50% diameter narrowing 1-4: > 50% diameter narrowing
]
ucihd = pd.read_csv(ucihd_local_path, header=None, names=ucihd_attr, na_values="?")
categorical_attr = ["cp", "fbs", "restecg", "exang", "thal"]
for col in categorical_attr:
  ucihd[col] = ucihd[col].astype("category")

# Clean label.
ucihd.loc[ucihd["label"] > 1, "label"] = 1

print(ucihd.shape)
print(ucihd.groupby("label").size())  # Label distribution.
print(ucihd.head())
```

The dataset contains both numerical and categorical features (all encoded in numerics already).

### Explain Random Forest

```{python ucihd_rf}
# RF doesn't allow missing value.
# For categorical (as string) we can leave one special category for missing,
# but for numerical we need to do some special encoding or imputation.
ucihd_2 = ucihd.copy()
ucihd_2.loc[ucihd_2["ca"].isna(), "ca"] = -1  # Encode missing numerical.

# One-hot encode all categorical features.
ucihd_2 = pd.get_dummies(ucihd_2, columns=categorical_attr, dummy_na=True)
ucihd_y = ucihd_2.pop("label")
ucihd_X_train, ucihd_X_test, ucihd_y_train, ucihd_y_test = train_test_split(
  ucihd_2, ucihd_y.values, test_size=.3, random_state=64)

ucihd_rf = RandomForestClassifier(n_estimators=100, random_state=64)
_ = ucihd_rf.fit(ucihd_X_train, ucihd_y_train)

ucihd_rf_yhat = ucihd_rf.predict_proba(ucihd_X_test)[:,1]
ucihd_rf_pred = ucihd_rf.predict(ucihd_X_test)

print(classification_report(ucihd_y_test, ucihd_rf_pred))
print(roc_auc_score(ucihd_y_test, ucihd_rf_yhat))
```

As one can see RF performs very well on this dataset.

To explain a model trained with numerical features,
`lime` by default will discretize continous variables into quantiles for ease of interpretation.
Discretization is done using statistics derived from the training dataset.

```{python lime_ucihd_rf}
from lime.lime_tabular import LimeTabularExplainer

cat_ind = [i for i, col in enumerate(ucihd_2.columns) if "_" in col]
ucihd_rf_explainer = LimeTabularExplainer(
  ucihd_X_train.values, class_names=["Negative", "Positive"],
  feature_names=ucihd_2.columns,
  categorical_features=cat_ind)

ucihd_rf_tp_idx = np.where(np.logical_and(ucihd_rf_pred == 1, ucihd_y_test == 1))[0]
ucihd_rf_fp_idx = np.where(np.logical_and(ucihd_rf_pred == 1, ucihd_y_test == 0))[0]

# We take one true positive and one false positive for examples.
ucihd_rf_tp_exp = ucihd_rf_explainer.explain_instance(
  ucihd_X_test.iloc[ucihd_rf_tp_idx[0]], ucihd_rf.predict_proba, num_features=4)
ucihd_rf_fp_exp = ucihd_rf_explainer.explain_instance(
  ucihd_X_test.iloc[ucihd_rf_fp_idx[0]], ucihd_rf.predict_proba, num_features=4)

ucihd_rf_tp_exp.save_to_file("/tmp/explain_tab_rf_tp.html")
ucihd_rf_fp_exp.save_to_file("/tmp/explain_tab_rf_fp.html")
```

#### A True Positive Prediction Explained {-}

```{r, echo=FALSE}
htmltools::includeHTML(parse_lime_html_output("/tmp/explain_tab_rf_tp.html"))
```

#### A False Positive Prediction Explained {-}

```{r, echo=FALSE}
htmltools::includeHTML(parse_lime_html_output("/tmp/explain_tab_rf_fp.html"))
```

**Discuss the two explanations.**

### Explain Gradient Boosting Trees

Gradient boosting trees (GBT) is a powerful model family proven to work exceptionally well in many different applications.
Yet due to its ensembling nature,
GBT is also hard to intrepret in general.

Here we demo `lightgbm`'s implementation of GBT with LIME explanation.

```{python ucihd_lgb}
import lightgbm as lgb

ucihd_tr = lgb.Dataset(ucihd_X_train, label=ucihd_y_train)
ucihd_te = lgb.Dataset(ucihd_X_test, label=ucihd_y_test)

ucihd_lgb_params = {
  "learning_rate": .01,
  "boosting_type": "gbdt",
  "objective": "binary",
  "metric": ["binary_logloss", "auc"],
  "num_leaves": 4,
  "max_depth": 2,
  "min_data_per_leaf": 5,
  "verbose": -1,
  "seed": 64
}

ucihd_bst = lgb.train(
  params=ucihd_lgb_params,
  num_boost_round=300, early_stopping_rounds=20,
  train_set=ucihd_tr, valid_sets=[ucihd_te],
  verbose_eval=10)

ucihd_lgb_yhat = ucihd_bst.predict(ucihd_X_test)
ucihd_lgb_pred = (ucihd_lgb_yhat > .5).astype(int)

print(classification_report(ucihd_y_test, ucihd_lgb_pred))
print(roc_auc_score(ucihd_y_test, ucihd_lgb_yhat))
```

In this particular (rather small) dataset RF indeed outperforms GBT.
As a matter of fact,
based on [existing benchmark](https://github.com/interpretml/interpret/tree/master/benchmarks) a simple logistic regression may have a even higher score for this problem.
Nevertheless, let's move on to our explanation model with LIME:

```{python lime_ucihd_lgb}
def ucihd_lgb_predict_fn(x):
  # We need to output 2 columns for binary prob prediction.
  p = ucihd_bst.predict(x).reshape(-1, 1)
  return np.hstack((1 - p, p))

ucihd_lgb_explainer = LimeTabularExplainer(
  ucihd_X_train.values, class_names=["Negative", "Positive"],
  feature_names=ucihd_2.columns,
  categorical_features=cat_ind)

# We take the same examples previously explained in our RF explanation model.
ucihd_lgb_tp_exp = ucihd_lgb_explainer.explain_instance(
  ucihd_X_test.iloc[ucihd_rf_tp_idx[0]], ucihd_lgb_predict_fn, num_features=4)
ucihd_lgb_fp_exp = ucihd_lgb_explainer.explain_instance(
  ucihd_X_test.iloc[ucihd_rf_fp_idx[0]], ucihd_lgb_predict_fn, num_features=4)

ucihd_lgb_tp_exp.save_to_file("/tmp/explain_tab_lgb_tp.html")
ucihd_lgb_fp_exp.save_to_file("/tmp/explain_tab_lgb_fp.html")
```

```{r, echo=FALSE}
htmltools::includeHTML(parse_lime_html_output("/tmp/explain_tab_lgb_tp.html"))
```

```{r, echo=FALSE}
htmltools::includeHTML(parse_lime_html_output("/tmp/explain_tab_lgb_fp.html"))
```

**TODO: Discussion on the explanation.**

#### Optimized Categorical Encoding in `lightgbm` {-}

**This section is a digression on `lightgbm` usage.**

Since `lime`'s API requires us to prepare our dataset in one-hot encoding representation,
our `lightgbm` code use the same data pipeline as in `scikit-learn` random forest.
But that is actually not optimized for `lightgbm`.
The following code chunk showcases the best practice of encoding categoricals in `lightgbm`:
We don't encode them at all!

```{python lgb_best_practice}
# We leave both missings and categoricals as-is in the dataset.
ucihd_train, ucihd_test = train_test_split(ucihd, test_size=.3, random_state=64)
ucihd_tr = lgb.Dataset(
  ucihd_train.drop("label", axis=1), label=ucihd_train["label"],
  categorical_feature=categorical_attr,
  free_raw_data=False)
ucihd_te = lgb.Dataset(
  ucihd_test.drop("label", axis=1), label=ucihd_test["label"],
  categorical_feature=categorical_attr,
  free_raw_data=False)

ucihd_bst_2 = lgb.train(
  params=ucihd_lgb_params,
  num_boost_round=300, early_stopping_rounds=20,
  train_set=ucihd_tr, valid_sets=[ucihd_te],
  verbose_eval=-1)

ucihd_lgb_yhat = ucihd_bst_2.predict(ucihd_test.drop("label", axis=1))
ucihd_lgb_pred = (ucihd_lgb_yhat > .5).astype(int)

print(roc_auc_score(ucihd_test["label"], ucihd_lgb_yhat))
```

To summarize,
There are two very special properties about `lightgbm` algorithm.
`lightgbm` treats missings natively as a special tree split point.
This allows us to keep the original missing as is and in many cases can result in better accuracy than imputation.^[`xgboost` is the first to introduce such missing treatment among all the GBT package. `lightgbm` follows.]

In addition,
`lightgbm` encodes categorical variables internally in a more efficient way.
So we don't even need to do one-hot encoding on our own.
Of course in this tiny dataset we won't see any noticable difference.
But for large applications the impact can be huge.
Whatever,
by skipping one-hot encoding pipeline our code can be much neater as well.

## On Image Classifier

**TODO: Use a pre-trained model?**

# Shapley Regression Values

**TODO: Theory Briefing here.**

# SHAP

@NIPS2017_7062 propose SHAP (**SHapley Additive exPlanations**),
yet another additive feature attribution method for model explainability.
It is a more general approach where LIME is indeed only a special case of it.
Just like LIME,
in theory it can be applied to *any* machine learning model,
but comes with a customized fast implementation particularly for gradient boosting trees (GBT).
It supports APIs of well-known GBT libraries such as
[`xgboost`](https://github.com/dmlc/xgboost),
[`lightgbm`](https://github.com/microsoft/LightGBM),
and [`catboost`](https://github.com/catboost/catboost).

The interpretability provided by SHAP is again *local*.
It assigns each feature an importance value *for a particular prediction.*
Hence it provides for any given model prediction what may be the driving force for the model to make such prediction.

`shap` also comes with more visualization methods for feature investigation,
especially for feature interaction exploration.

**TODO: Theory Briefing here.**

## On Text Classifiers

### Explain Random Forest

`shap.TreeExplainer` is optimized for GBT but not RF.
For model with high dimensionality like a bag-of-words model it will suffer from high computation cost for non-GBT model.
Hence we will skip the discussion on RF and move forward to a GBT implementation.

### Explain Gradient Boosting Trees

In the previous section we didn't train a GBT for the text classification problem.
So let's quickly build one such model first (with the same TF-IDF vectorization as we did for the random forest model).

```{python imdb_lgb}
# lightgbm does not allow utf-8 encoded feature names.
# Since important tokens are most likely ascii-compatible for our dataset,
# we simply strip non-ascii as a workaround for this exercise.
def remove_non_ascii(s):
  return "".join([i if ord(i) < 128 else "_" for i in s])

sorted_vocab_ascii = [remove_non_ascii(v) for v in sorted_vocab]

imdb_X_tr = lgb.Dataset(imdb_X_train, label=imdb_y_train, feature_name=sorted_vocab_ascii)
imdb_X_te = lgb.Dataset(imdb_X_test, label=imdb_y_test, feature_name=sorted_vocab_ascii)

imdb_lgb_params = {
  "learning_rate": .05,
  "boosting_type": "gbdt",
  "objective": "binary",
  "metric": ["binary_logloss", "auc"],
  "num_leaves": 16,
  "max_depth": 4,
  "min_data_per_leaf": 20,
  "verbose": -1
}

imdb_lgb_model_file = "models/text_clf_lgb.txt"

# Save/reload model to save notebook rendering time.
if os.path.exists(imdb_lgb_model_file):
  # TODO:
  # Parameters are not loaded back? A bug? (Which cause the subsequent call to shap_values fail.)
  # https://github.com/microsoft/LightGBM/issues/2613
  imdb_bst = lgb.Booster(model_file=imdb_lgb_model_file, params=imdb_lgb_params)
else:
  imdb_bst = lgb.train(
    params=imdb_lgb_params,
    num_boost_round=1000, early_stopping_rounds=20,
    train_set=imdb_X_tr, valid_sets=[imdb_X_te],
    verbose_eval=100)
  _ = imdb_bst.save_model(imdb_lgb_model_file)

imdb_lgb_yhat = imdb_bst.predict(imdb_X_test)
imdb_lgb_pred = (imdb_lgb_yhat > .5).astype(int)

print(classification_report(imdb_y_test, imdb_lgb_pred))
print(roc_auc_score(imdb_y_test, imdb_lgb_yhat))
```

Just like RF we will have access to the overall feature importance with a GBT model:^[By default `lightgbm` calculate the importance by counting how many times a feature contribute to an optimal split during model training.
It also supports the impurity-based approach with argument `importance_type` set to `"gain"`.]

```{python imdb_lgb_feat_imp}
ax = lgb.plot_importance(imdb_bst, max_num_features=20)
plt.show()
```

Now for the explanation model,
since `shap.TreeExplainer` is customized for GBT for speed,
we can feed in all testing examples to calculate all shap values at once.

```{python shap_imdb_lgb}
import shap

# Sparse matrix is supported by shap for lightgbm models.
imdb_lgb_explainer = shap.TreeExplainer(imdb_bst)
imdb_lgb_shap_values = imdb_lgb_explainer.shap_values(imdb_X_test)

def imdb_lgb_shap_plot(test_id, matplotlib=True):
  shap_plt = shap.force_plot(
    imdb_lgb_explainer.expected_value[1],
    imdb_lgb_shap_values[1][test_id,:],
    imdb_X_test[test_id,:].toarray(),  # We still need a dense matrix here.
    feature_names=sorted_vocab,
    matplotlib=matplotlib
  )
  return shap_plt
```

#### Global Importance {-}

We can derive global importance based on shap values.
Note that this is different from the loss/impurity or split time-based feature ranking derived from RF/GBT *during training*.
It is an aggregation from all local prediction explanations (contributions) *during testing data inference*.

```{python shap_imdb_lgb_feat_imp}
shap.summary_plot(imdb_lgb_shap_values, imdb_X_test, feature_names=sorted_vocab,
                  plot_type="bar", max_display=20, show=False)
plt.show()
```

#### Local Explanation {-}

```{python}
imdb_lgb_shap_plot(imdb_rf_tp_idx[0])
```

```{python}
imdb_lgb_shap_plot(imdb_rf_fp_idx[0])
```

**TODO: Discuss the result. Any thing different from LIME?**

By default `shap` for `lightgbm` shows log-odds rather than probability in the plot.
To verify this:

```{python verify_log_odds}
# Take the first true positive to examine:
p = imdb_bst.predict(imdb_X_test[imdb_rf_tp_idx[0],:].toarray())
print(p)
print(np.log(p / (1 - p)))
```

### Explain Neural Nets with Word Embeddings

As of `r format(Sys.time(), '%Y-%m-%d')` `DeepExplainer` does not yet support TF 2.0.^[https://github.com/slundberg/shap/issues/850.]
And `GradientExplainer` is not well documented yet for TF 2.0.
So we will use the `KernelExplainer` which is a implementation-agnostic explainer in `shap`.
The compromise is that it will run very slow for each prediction.

```{python shap_kernel_imdb_nn}
imdb_exp_ind = np.array([imdb_rf_tp_idx[0], imdb_rf_fp_idx[0]])
# KernelExplainer.
def mm(X):
  return imdb_tr.predict_proba(X)[:,1]

imdb_nn_shap_explainer = shap.KernelExplainer(mm, seq_train_padded[:100])
# This is VERY slow...
imdb_nn_kernel_shap_values = imdb_nn_shap_explainer.shap_values(seq_test_padded[imdb_exp_ind])
# TODO:
# Contribution is attributed to original sequence input.
# In order to make explanation readable,
# we need to map each position to original word id then to word.
```

```python
# TODO: Makje sure everything works here.

# shap does not support keras model in scikit-learn wrapper.
# Let's re-build the model and retain its Sequental class.
dl_model = model_fn()
metrics = dl_model.fit(
  x=seq_train_padded, y=imdb_y_train,
  batch_size=256, epochs=20,
  validation_data=(seq_test_padded, imdb_y_test),
  validation_steps=20,
  callbacks=[
    tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=2),
    tf.keras.callbacks.ModelCheckpoint(tr_model_file, monitor="val_loss", save_best_only=True)
  ],
  verbose=0)

# DeepExplainer.
dl_shap_explainer = shap.DeepExplainer(dl_model, seq_train_padded)  # Wont' work.

# GradientExplainer.
imdb_nn_shap_explainer = shap.GradientExplainer(dl_model, seq_train_padded[:100])

imdb_nn_shap_explainer = shap.GradientExplainer(
  (imdb_tr.layers[0].input, imdb_tr.layers[-1].output),  # Not working for TF 2.0.
  seq_train_padded[:100])
imdb_nn_shap_explainer.shap_values(seq_test_padded[:3])  # Error here.
```

## On Tabular Data Classifier

### Explain Random Forest

```{python shap_ucihd_rf}
ucihd_rf_explainer = shap.TreeExplainer(ucihd_rf)
ucihd_rf_shap_values = ucihd_rf_explainer.shap_values(ucihd_X_test)

def ucihd_rf_shap_plot(test_id, matplotlib=True):
  shap_plt = shap.force_plot(
    ucihd_rf_explainer.expected_value[1],
    ucihd_rf_shap_values[1][test_id,:],
    ucihd_X_test.iloc[[test_id]],
    matplotlib=matplotlib
  )
  return shap_plt
```

#### Global Feature Importance {-}

##### Split-time-based feature ranking {-}

```{python ucihd_rf_feat_imp}
ucihd_rf_feat_imp = pd.Series(ucihd_rf.feature_importances_, index=ucihd_X_train.columns).sort_values()
ax = ucihd_rf_feat_imp.tail(10).plot(kind="barh")
plt.show()
```

##### Shap value feature ranking {-}

```{python shap_ucihd_rf_feat_imp}
shap.summary_plot(ucihd_rf_shap_values, ucihd_X_train,
                  plot_type="bar", max_display=10, show=False)
plt.show()
```

#### Local Explanation {-}

```{python}
ucihd_rf_shap_plot(ucihd_rf_tp_idx[0])
```

```{python}
ucihd_rf_shap_plot(ucihd_rf_fp_idx[0])
```

### Explain Gradient Boosting Trees

For GBT we feed the model that is optimized,
where categoricals are encoded internally without explicit one-hot encoding.

```{python shap_ucihd_lgb}
ucihd_lgb_explainer = shap.TreeExplainer(ucihd_bst_2)
ucihd_lgb_shap_values = ucihd_lgb_explainer.shap_values(ucihd_test.drop("label", axis=1))

def ucihd_lgb_shap_plot(test_id, matplotlib=True):
  shap_plt = shap.force_plot(
    ucihd_lgb_explainer.expected_value[1],
    ucihd_lgb_shap_values[1][test_id,:],
    ucihd_test.iloc[[test_id]].drop("label", axis=1),
    matplotlib=matplotlib
  )
  return shap_plt
```

#### Global Feature Importance {-}

##### Split-time-based feature ranking {-}

```{python ucihd_lgb_feat_imp}
ax = lgb.plot_importance(ucihd_bst_2, max_num_features=10)
plt.show()
```

##### Shap value feature ranking {-}

```{python shap_ucihd_lgb_feat_imp}
shap.summary_plot(ucihd_lgb_shap_values, ucihd_test.drop("label", axis=1),
                  plot_type="bar", max_display=10, show=False)
plt.show()
```

#### Local Explanation {-}

```{python}
ucihd_lgb_shap_plot(ucihd_rf_tp_idx[0])
```

```{python}
ucihd_lgb_shap_plot(ucihd_rf_fp_idx[0])
```

### The Impact of One-Hot Encoding On Explanation

As one may now realize,
by explicitly one-hot-encode the categorical features we essentially split them into different features in their interpretable representation.
This can be either good or bad, depending on the actual use case.
From this particular aspect libary such as `lightgbm` provides the flexibility to allow us choose whether to do the one-hot encoding or not.
So the way we want to construct the explanation model may well affect our implementation of the original model!

## On Image Classifier

**TODO: Use a pre-trained model?**

# Explainable Boosting Machine

@nori2019interpretml publish the open source package `interpret` for a fast implementation of **Generalized Additive Models with Pairwise Interactions, or GA<sup>2</sup>M** (@lou2013accurate).
As of `r format(Sys.time(), '%Y-%m-%d')`, `interpret` is still in its alpha release with limited documentation.
The library contains two groups of modeling frameworks:

+ `glassbox`: explanable machine learning models
+ `blackbox`: machine learning explanation models (such as LIME and SHAP)

We've already covered the mainstream approach in the second group,
i.e.,
models that approximate (locally) the original model (supposed to be a blackbox) for better explainability.
The more interesting part of `interpret` is to bring about another type of model that is readily interpretable from its very origin,
and yet still competitively accurate:
**the Explainable Boosting Machine**, or EBM.

EBM is an additive model of the form:

$$
g(E(y)) = \beta_0 + \sum f_j (x_j) + \sum f_{ij}(x_i, x_j),
$$

where $g(\cdot)$ is a link function (sigmoid for binary classification, for an example),
$f_j$ is the *feature function* for the $j$-th feature,
learned by a gradient boosting machine with only that feature at a time and in a round-robin fashion for all features.
$f_{ij}$ is a *pairwise interaction* feature function to further boost the accuracy of the model while remain interpretability.

The model is interpretable since the contribution of any individual feature can be directly quantified by their corresponding feature function $f_j$.
Such explanation can extend up to pairwise interaction if pairwise feature functions are also estimated.

**TODO: How to detect pairwise interaction? Brief the FAST algorithm.**

## On Text/Image Data

EBM is not efficient for text dataset.
Due to the algorithm's design it will run too long for bag-of-words model since there are too many feature functions to estimate.
If we fit a EBM with the movie review dataset,
even if not a large dataset,
we will encounter OOM (out-of-memory) issue.
As a result,
we will skip the discussion of EBM on a text classifier.
(The same restriction applies to image dataset.)

## On Tabular Data

`ExplainableBoostingClassifier` has a `scikit-learn` fashion API and hence is straightforward to use.

```{python ucihd_ebm}
from interpret.glassbox import ExplainableBoostingClassifier

ucihd_ebm = ExplainableBoostingClassifier(
  n_estimators=16, feature_names=ucihd_2.columns, n_jobs=1)
_ = ucihd_ebm.fit(ucihd_X_train, ucihd_y_train)

ucihd_ebm_yhat = ucihd_ebm.predict_proba(ucihd_X_test)[:,1]
ucihd_ebm_pred = (ucihd_ebm_yhat > .5).astype(int)

print(classification_report(ucihd_y_test, ucihd_ebm_pred))
print(roc_auc_score(ucihd_y_test, ucihd_ebm_yhat))
```

The model performs very well on the heart disease dataset,
outperforming both RF and GBT.

### Global Explanation

`interpret` comes with a rich set of visualization tools (with [`plotly`](https://plot.ly/) as its backend).
Model explanation is divided into two groups:
global and local.

For global explanation,
we have access to both global feature importance and a per-feature feature contribution stats.

```{python ucihd_ebm_global_explain}
ucihd_ebm_global = ucihd_ebm.explain_global()
# All feature info:
print(ucihd_ebm_global.selector)

# Global feature importance.
ucihd_ebm_global.visualize().write_html("/tmp/ucihd_ebm_feat_imp.html", include_plotlyjs=False)

# Global contribution on the fisrt feature.
ucihd_ebm_global.visualize(0).write_html("/tmp/ucihd_ebm_age_imp.html", include_plotlyjs=False)

# Global contribution on the sencond feature.
ucihd_ebm_global.visualize(1).write_html("/tmp/ucihd_ebm_sex_imp.html", include_plotlyjs=False)
```

#### Feature Importance {-}

```{r, echo=FALSE}
htmltools::includeHTML("/tmp/ucihd_ebm_feat_imp.html")
```

#### Feature Contribution: Age {-}

```{r, echo=FALSE}
htmltools::includeHTML("/tmp/ucihd_ebm_age_imp.html")
```

#### Feature Contribution: Gender {-}

```{r, echo=FALSE}
htmltools::includeHTML("/tmp/ucihd_ebm_sex_imp.html")
```

### Local Explanation

More importantly,
we must be able to explain a specific model prediction locally.
This can also be done easily with a couple of lines:

```{python ucihd_ebm_local_explain}
# Explain the same instances previously on RF.
ucihd_exp_ind = np.array([ucihd_rf_tp_idx[0], ucihd_rf_fp_idx[0]])

# We can feed multiple examples at the same time.
ucihd_ebm_local = ucihd_ebm.explain_local(
  ucihd_X_test.iloc[ucihd_exp_ind,:], ucihd_y_test[ucihd_exp_ind])
ucihd_ebm_local.visualize(0).write_html("/tmp/ucihd_ebm_exp_tp.html", include_plotlyjs=False)
ucihd_ebm_local.visualize(1).write_html("/tmp/ucihd_ebm_exp_fp.html", include_plotlyjs=False)
```

```{r, echo=FALSE}
htmltools::includeHTML("/tmp/ucihd_ebm_exp_tp.html")
```

```{r, echo=FALSE}
htmltools::includeHTML("/tmp/ucihd_ebm_exp_fp.html")
```

# References
